Apr 30 08:29:44 fir-md1-s1 kernel: LNet: HW NUMA nodes: 4, HW CPU cores: 48, npartitions: 4 Apr 30 08:29:44 fir-md1-s1 kernel: alg: No test for adler32 (adler32-zlib) Apr 30 08:29:45 fir-md1-s1 kernel: Lustre: Lustre: Build Version: 2.12.0.pl9 Apr 30 08:29:45 fir-md1-s1 kernel: LNet: Using FastReg for registration Apr 30 08:29:45 fir-md1-s1 kernel: LNet: Added LNI 10.0.10.51@o2ib7 [8/256/0/180] Apr 30 08:29:45 fir-md1-s1 kernel: LNetError: 7253:0:(o2iblnd_cb.c:2469:kiblnd_passive_connect()) Can't accept conn from 10.0.10.106@o2ib7 on NA (ib0:1:10.0.10.51): bad dst nid 10.0.10.51@o2ib7 Apr 30 08:30:47 fir-md1-s1 kernel: LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc Apr 30 08:30:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1153d653-edd7-2ca8-72ae-5f213cd0d2c4 (at 0@lo) Apr 30 08:30:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 74828ce0-ea4f-f77f-83f0-8f7bbd9d7f10 (at 10.8.27.17@o2ib6) Apr 30 08:30:48 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Apr 30 08:30:49 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b4679b48-b11c-9680-7d24-915c2e7a6a0e (at 10.8.24.18@o2ib6) Apr 30 08:30:49 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Apr 30 08:30:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 2cf55f95-62ef-57e6-6781-995d0b924cb6 (at 10.8.7.22@o2ib6) Apr 30 08:30:51 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages Apr 30 08:30:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.114.15@o2ib4) Apr 30 08:30:57 fir-md1-s1 kernel: Lustre: Skipped 145 previous similar messages Apr 30 08:31:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a19fbd52-fc1f-6afe-5025-88bbd6370298 (at 10.9.102.36@o2ib4) Apr 30 08:31:06 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Apr 30 08:31:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 0c2afddf-dbbe-bdc6-2650-d15e478dbb2e (at 10.9.102.4@o2ib4) Apr 30 08:31:22 fir-md1-s1 kernel: Lustre: Skipped 112 previous similar messages Apr 30 08:31:23 fir-md1-s1 kernel: LDISKFS-fs (dm-4): file extents enabled Apr 30 08:31:23 fir-md1-s1 kernel: LDISKFS-fs (dm-0): file extents enabled Apr 30 08:31:23 fir-md1-s1 kernel: , maximum tree depth=5 Apr 30 08:31:23 fir-md1-s1 kernel: , maximum tree depth=5 Apr 30 08:31:23 fir-md1-s1 kernel: LDISKFS-fs (dm-4): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc Apr 30 08:31:23 fir-md1-s1 kernel: LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc Apr 30 08:31:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.103.37@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 30 08:31:24 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Apr 30 08:31:24 fir-md1-s1 kernel: LustreError: 11-0: fir-MDT0001-osp-MDT0000: operation mds_connect to node 10.0.10.52@o2ib7 failed: rc = -114 Apr 30 08:31:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Imperative Recovery not enabled, recovery window 300-900 Apr 30 08:31:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.1.3@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 30 08:31:24 fir-md1-s1 kernel: LustreError: Skipped 34 previous similar messages Apr 30 08:31:24 fir-md1-s1 kernel: Lustre: fir-MDD0000: changelog on Apr 30 08:31:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: in recovery but waiting for the first client to connect Apr 30 08:31:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Will be in recovery for at least 5:00, or until 1334 clients reconnect Apr 30 08:31:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.25.27@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 30 08:31:25 fir-md1-s1 kernel: LustreError: Skipped 65 previous similar messages Apr 30 08:31:26 fir-md1-s1 kernel: LustreError: 101695:0:(tgt_handler.c:525:tgt_filter_recovery_request()) @@@ not permitted during recovery req@ffff982bb1bc0900 x1631600671074432/t0(0) o601->fir-MDT0000-lwp-OST0022_UUID@10.0.10.105@o2ib7:2/0 lens 336/0 e 0 to 0 dl 1556638292 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 30 08:31:27 fir-md1-s1 kernel: LustreError: 11-0: fir-MDT0000-osp-MDT0002: operation mds_connect to node 0@lo failed: rc = -114 Apr 30 08:31:27 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Apr 30 08:31:27 fir-md1-s1 kernel: LustreError: 101983:0:(tgt_handler.c:525:tgt_filter_recovery_request()) @@@ not permitted during recovery req@ffff982ccf81c200 x1631600671088896/t0(0) o601->fir-MDT0000-lwp-OST001e_UUID@10.0.10.105@o2ib7:3/0 lens 336/0 e 0 to 0 dl 1556638293 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 30 08:31:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Imperative Recovery not enabled, recovery window 300-900 Apr 30 08:31:27 fir-md1-s1 kernel: LustreError: 101983:0:(tgt_handler.c:525:tgt_filter_recovery_request()) Skipped 77 previous similar messages Apr 30 08:31:27 fir-md1-s1 kernel: Lustre: fir-MDD0002: changelog on Apr 30 08:31:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: in recovery but waiting for the first client to connect Apr 30 08:31:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.102.70@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 30 08:31:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Will be in recovery for at least 5:00, or until 1334 clients reconnect Apr 30 08:31:27 fir-md1-s1 kernel: LustreError: Skipped 176 previous similar messages Apr 30 08:31:28 fir-md1-s1 kernel: LustreError: 101980:0:(tgt_handler.c:525:tgt_filter_recovery_request()) @@@ not permitted during recovery req@ffff982b4e798900 x1631600671093920/t0(0) o601->fir-MDT0000-lwp-OST001e_UUID@10.0.10.105@o2ib7:4/0 lens 336/0 e 0 to 0 dl 1556638294 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 30 08:31:28 fir-md1-s1 kernel: LustreError: 101980:0:(tgt_handler.c:525:tgt_filter_recovery_request()) Skipped 51 previous similar messages Apr 30 08:31:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Denying connection for new client e6faa00b-070f-4d22-51ac-e59042b5a00c(at 10.8.12.33@o2ib6), waiting for 1334 known clients (96 recovered, 8 in progress, and 0 evicted) already passed deadline 0:04 Apr 30 08:31:29 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 30 08:31:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.102.34@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 30 08:31:31 fir-md1-s1 kernel: LustreError: 101980:0:(tgt_handler.c:525:tgt_filter_recovery_request()) @@@ not permitted during recovery req@ffff982c7deb0300 x1631600671094000/t0(0) o601->fir-MDT0000-lwp-OST0022_UUID@10.0.10.105@o2ib7:7/0 lens 336/0 e 0 to 0 dl 1556638297 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 30 08:31:31 fir-md1-s1 kernel: LustreError: 101980:0:(tgt_handler.c:525:tgt_filter_recovery_request()) Skipped 4 previous similar messages Apr 30 08:31:31 fir-md1-s1 kernel: LustreError: Skipped 477 previous similar messages Apr 30 08:31:36 fir-md1-s1 kernel: LustreError: 101980:0:(tgt_handler.c:525:tgt_filter_recovery_request()) @@@ not permitted during recovery req@ffff98267623ac50 x1631600796636864/t0(0) o601->fir-MDT0000-lwp-OST0007_UUID@10.0.10.102@o2ib7:12/0 lens 336/0 e 0 to 0 dl 1556638302 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 30 08:31:36 fir-md1-s1 kernel: LustreError: 101980:0:(tgt_handler.c:525:tgt_filter_recovery_request()) Skipped 2 previous similar messages Apr 30 08:31:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.0.10.52@o2ib7 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 30 08:31:42 fir-md1-s1 kernel: LustreError: Skipped 1007 previous similar messages Apr 30 08:31:45 fir-md1-s1 kernel: LustreError: 101983:0:(tgt_handler.c:525:tgt_filter_recovery_request()) @@@ not permitted during recovery req@ffff982beef62400 x1631600671095088/t0(0) o601->fir-MDT0000-lwp-OST0022_UUID@10.0.10.105@o2ib7:21/0 lens 336/0 e 0 to 0 dl 1556638311 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 30 08:31:45 fir-md1-s1 kernel: LustreError: 101983:0:(tgt_handler.c:525:tgt_filter_recovery_request()) Skipped 3 previous similar messages Apr 30 08:31:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to db3bd310-8795-965d-3dea-b9fd8474f855 (at 10.0.10.101@o2ib7) Apr 30 08:31:58 fir-md1-s1 kernel: Lustre: Skipped 3645 previous similar messages Apr 30 08:32:02 fir-md1-s1 kernel: LustreError: 102201:0:(tgt_handler.c:525:tgt_filter_recovery_request()) @@@ not permitted during recovery req@ffff9846563acb00 x1631600706336656/t0(0) o601->fir-MDT0000-lwp-OST0004_UUID@10.0.10.101@o2ib7:8/0 lens 336/0 e 0 to 0 dl 1556638328 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 30 08:32:02 fir-md1-s1 kernel: LustreError: 101699:0:(tgt_handler.c:525:tgt_filter_recovery_request()) @@@ not permitted during recovery req@ffff984ca6f15d00 x1631600706336640/t0(0) o601->fir-MDT0000-lwp-OST0004_UUID@10.0.10.101@o2ib7:8/0 lens 336/0 e 0 to 0 dl 1556638328 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 30 08:32:02 fir-md1-s1 kernel: LustreError: 101699:0:(tgt_handler.c:525:tgt_filter_recovery_request()) Skipped 955 previous similar messages Apr 30 08:32:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Denying connection for new client e6faa00b-070f-4d22-51ac-e59042b5a00c(at 10.8.12.33@o2ib6), waiting for 1334 known clients (1239 recovered, 87 in progress, and 0 evicted) already passed deadline 0:39 Apr 30 08:32:04 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 30 08:32:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Recovery already passed deadline 0:53. If you do not want to wait more, please abort the recovery by force. Apr 30 08:32:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Recovery over after 0:53, of 1334 clients 1334 recovered and 0 were evicted. Apr 30 08:32:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Recovery over after 0:51, of 1334 clients 1334 recovered and 0 were evicted. Apr 30 08:36:16 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client a3a1deb5-f17f-91e3-ffe6-f046415da924 (at 10.8.12.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985cc7bd6800, cur 1556638576 expire 1556638426 last 1556638349 Apr 30 08:36:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e6faa00b-070f-4d22-51ac-e59042b5a00c (at 10.8.12.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98378ba86800, cur 1556638579 expire 1556638429 last 1556638352 Apr 30 08:37:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a3a1deb5-f17f-91e3-ffe6-f046415da924 (at 10.8.12.33@o2ib6) Apr 30 08:37:24 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Apr 30 08:41:11 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client a97fecac-5576-d099-2d6d-ba3e8ee376e1 (at 10.8.12.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984cabf02c00, cur 1556638871 expire 1556638721 last 1556638644 Apr 30 08:41:11 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 30 08:41:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d443b2d2-ef37-1815-642c-90bcf5846a13 (at 10.8.12.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983c7a3d0000, cur 1556638874 expire 1556638724 last 1556638647 Apr 30 08:42:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a3a1deb5-f17f-91e3-ffe6-f046415da924 (at 10.8.12.33@o2ib6) Apr 30 08:42:19 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 30 08:46:06 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client ed3d9ca0-da54-f70a-e918-7d51b4894506 (at 10.8.12.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9827afa23000, cur 1556639166 expire 1556639016 last 1556638939 Apr 30 08:46:06 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 30 08:47:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a3a1deb5-f17f-91e3-ffe6-f046415da924 (at 10.8.12.33@o2ib6) Apr 30 08:47:15 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 30 08:51:02 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 054d8096-224c-690f-3549-b7034c5669d6 (at 10.8.12.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985c70a54c00, cur 1556639462 expire 1556639312 last 1556639235 Apr 30 08:51:02 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 30 08:52:23 fir-md1-s1 kernel: Lustre: 102380:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556639536/real 1556639536] req@ffff985a6332e000 x1632253444072512/t0(0) o104->fir-MDT0000@10.8.10.29@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556639543 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Apr 30 08:52:30 fir-md1-s1 kernel: Lustre: 102380:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556639543/real 1556639543] req@ffff985a6332e000 x1632253444072512/t0(0) o104->fir-MDT0000@10.8.10.29@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556639550 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Apr 30 08:52:31 fir-md1-s1 kernel: Lustre: 102606:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff984c85e48300 x1631625331264144/t0(0) o101->a1b347d2-59e9-5de2-6e29-00884610c229@10.8.25.22@o2ib6:6/0 lens 592/3264 e 1 to 0 dl 1556639556 ref 2 fl Interpret:/0/0 rc 0/0 Apr 30 08:52:32 fir-md1-s1 kernel: Lustre: 102420:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982b9900c500 x1631776836655760/t0(0) o101->74b6aa20-82e6-c3c9-1fe9-141d2e6b56e2@10.8.26.30@o2ib6:6/0 lens 592/3264 e 1 to 0 dl 1556639556 ref 2 fl Interpret:/0/0 rc 0/0 Apr 30 08:52:32 fir-md1-s1 kernel: Lustre: 102420:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Apr 30 08:52:33 fir-md1-s1 kernel: Lustre: 101920:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982b9900e900 x1631496285003632/t0(0) o101->53803d2a-ea9e-0335-702a-3d9daed0d916@10.8.22.17@o2ib6:7/0 lens 592/3264 e 1 to 0 dl 1556639557 ref 2 fl Interpret:/0/0 rc 0/0 Apr 30 08:52:33 fir-md1-s1 kernel: Lustre: 101920:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Apr 30 08:52:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 0b49eccd-cda4-7bac-8560-4f28415786a3 (at 10.9.0.62@o2ib4) reconnecting Apr 30 08:52:37 fir-md1-s1 kernel: Lustre: 102380:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556639550/real 1556639550] req@ffff985a6332e000 x1632253444072512/t0(0) o104->fir-MDT0000@10.8.10.29@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556639557 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Apr 30 08:52:38 fir-md1-s1 kernel: Lustre: 102530:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982b96b5b600 x1631695967233984/t0(0) o101->acb1aa3b-60ab-7f7c-ec38-03838117cd24@10.8.25.12@o2ib6:13/0 lens 592/3264 e 1 to 0 dl 1556639563 ref 2 fl Interpret:/0/0 rc 0/0 Apr 30 08:52:38 fir-md1-s1 kernel: Lustre: 102530:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Apr 30 08:52:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 53803d2a-ea9e-0335-702a-3d9daed0d916 (at 10.8.22.17@o2ib6) reconnecting Apr 30 08:52:38 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Apr 30 08:52:42 fir-md1-s1 kernel: Lustre: 102590:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff983c3e7d4b00 x1631795301549552/t0(0) o101->7b47d238-4d96-3180-1efb-43deab0e7ece@10.8.24.19@o2ib6:17/0 lens 592/3264 e 1 to 0 dl 1556639567 ref 2 fl Interpret:/0/0 rc 0/0 Apr 30 08:52:44 fir-md1-s1 kernel: Lustre: 102380:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556639557/real 1556639557] req@ffff985a6332e000 x1632253444072512/t0(0) o104->fir-MDT0000@10.8.10.29@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556639564 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Apr 30 08:52:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client acb1aa3b-60ab-7f7c-ec38-03838117cd24 (at 10.8.25.12@o2ib6) reconnecting Apr 30 08:52:44 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 30 08:52:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 7b47d238-4d96-3180-1efb-43deab0e7ece (at 10.8.24.19@o2ib6) reconnecting Apr 30 08:52:50 fir-md1-s1 kernel: Lustre: 102370:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982b66e8bc50 x1631686241233568/t0(0) o101->e069e613-f413-14c2-adc9-8bb2c0565535@10.8.20.30@o2ib6:25/0 lens 592/3264 e 1 to 0 dl 1556639575 ref 2 fl Interpret:/0/0 rc 0/0 Apr 30 08:52:50 fir-md1-s1 kernel: Lustre: 102370:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Apr 30 08:52:51 fir-md1-s1 kernel: Lustre: 102380:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556639564/real 1556639564] req@ffff985a6332e000 x1632253444072512/t0(0) o104->fir-MDT0000@10.8.10.29@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556639571 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Apr 30 08:52:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client b7ad475a-9cfb-ec6c-4413-22a589606837 (at 10.8.11.3@o2ib6) reconnecting Apr 30 08:53:05 fir-md1-s1 kernel: Lustre: 102380:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556639578/real 1556639578] req@ffff985a6332e000 x1632253444072512/t0(0) o104->fir-MDT0000@10.8.10.29@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556639585 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Apr 30 08:53:05 fir-md1-s1 kernel: Lustre: 102380:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Apr 30 08:53:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client acb1aa3b-60ab-7f7c-ec38-03838117cd24 (at 10.8.25.12@o2ib6) reconnecting Apr 30 08:53:05 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Apr 30 08:53:17 fir-md1-s1 kernel: Lustre: 102420:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff982acaebec00 x1631730847137280/t0(0) o101->d4d733ff-8d4b-d8de-bbc6-b5ae7cc529ba@10.8.12.35@o2ib6:22/0 lens 592/3264 e 0 to 0 dl 1556639602 ref 2 fl Interpret:/0/0 rc 0/0 Apr 30 08:53:17 fir-md1-s1 kernel: Lustre: 102420:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 5 previous similar messages Apr 30 08:53:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client d4d733ff-8d4b-d8de-bbc6-b5ae7cc529ba (at 10.8.12.35@o2ib6) reconnecting Apr 30 08:53:23 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Apr 30 08:53:26 fir-md1-s1 kernel: Lustre: 102380:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556639599/real 1556639599] req@ffff985a6332e000 x1632253444072512/t0(0) o104->fir-MDT0000@10.8.10.29@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556639606 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Apr 30 08:53:26 fir-md1-s1 kernel: Lustre: 102380:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Apr 30 08:53:46 fir-md1-s1 kernel: LustreError: 102394:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556639536, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff983cf8ffad00/0xce8853847d1bd099 lrc: 3/1,0 mode: --/PR res: [0x20001e803:0x12c:0x0].0x0 bits 0x13/0x0 rrc: 29 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 102394 timeout: 0 lvb_type: 0 Apr 30 08:53:46 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556639626.102394 Apr 30 08:53:47 fir-md1-s1 kernel: LustreError: 102571:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556639537, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff984c67232ac0/0xce8853847d1d0931 lrc: 3/1,0 mode: --/PR res: [0x20001e803:0x12c:0x0].0x0 bits 0x13/0x0 rrc: 29 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 102571 timeout: 0 lvb_type: 0 Apr 30 08:53:47 fir-md1-s1 kernel: LustreError: 102571:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 3 previous similar messages Apr 30 08:53:53 fir-md1-s1 kernel: LustreError: 102532:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556639543, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff982cd00bb840/0xce8853847d27cb6b lrc: 3/1,0 mode: --/PR res: [0x20001e803:0x12c:0x0].0x0 bits 0x13/0x0 rrc: 33 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 102532 timeout: 0 lvb_type: 0 Apr 30 08:53:53 fir-md1-s1 kernel: LustreError: 102532:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Apr 30 08:53:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client b7ad475a-9cfb-ec6c-4413-22a589606837 (at 10.8.11.3@o2ib6) reconnecting Apr 30 08:53:56 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages Apr 30 08:53:56 fir-md1-s1 kernel: Lustre: 102702:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff983cc0a06000 x1631684958101408/t0(0) o101->8455dbbf-4366-afd6-29b8-dc2a91bfd5f9@10.8.11.12@o2ib6:1/0 lens 592/3264 e 0 to 0 dl 1556639641 ref 2 fl Interpret:/0/0 rc 0/0 Apr 30 08:53:56 fir-md1-s1 kernel: Lustre: 102702:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 6 previous similar messages Apr 30 08:53:57 fir-md1-s1 kernel: LustreError: 102403:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556639547, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff983ca864d100/0xce8853847d2e6669 lrc: 3/1,0 mode: --/PR res: [0x20001e803:0x12c:0x0].0x0 bits 0x13/0x0 rrc: 34 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 102403 timeout: 0 lvb_type: 0 Apr 30 08:54:01 fir-md1-s1 kernel: Lustre: 102380:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556639634/real 1556639634] req@ffff985a6332e000 x1632253444072512/t0(0) o104->fir-MDT0000@10.8.10.29@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556639641 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Apr 30 08:54:01 fir-md1-s1 kernel: Lustre: 102380:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Apr 30 08:54:02 fir-md1-s1 kernel: LustreError: 102606:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556639552, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff984c83b29200/0xce8853847d370a85 lrc: 3/1,0 mode: --/PR res: [0x20001e803:0x12c:0x0].0x0 bits 0x13/0x0 rrc: 35 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 102606 timeout: 0 lvb_type: 0 Apr 30 08:54:17 fir-md1-s1 kernel: LustreError: 102649:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556639567, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff983c2c2cd580/0xce8853847d4fe404 lrc: 3/1,0 mode: --/PR res: [0x20001e803:0x12c:0x0].0x0 bits 0x13/0x0 rrc: 37 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 102649 timeout: 0 lvb_type: 0 Apr 30 08:54:17 fir-md1-s1 kernel: LustreError: 102649:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 3 previous similar messages Apr 30 08:54:34 fir-md1-s1 kernel: LustreError: 102488:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556639584, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff982816623600/0xce8853847d6c2565 lrc: 3/1,0 mode: --/PR res: [0x20001e803:0x12c:0x0].0x0 bits 0x13/0x0 rrc: 40 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 102488 timeout: 0 lvb_type: 0 Apr 30 08:54:34 fir-md1-s1 kernel: LustreError: 102488:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 5 previous similar messages Apr 30 08:54:51 fir-md1-s1 kernel: LustreError: 102380:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.10.29@o2ib6) failed to reply to blocking AST (req@ffff985a6332e000 x1632253444072512 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff9835c2218240/0xce885384746ac067 lrc: 4/0,0 mode: PR/PR res: [0x20001e803:0x12c:0x0].0x0 bits 0x13/0x0 rrc: 41 type: IBT flags: 0x60200400000020 nid: 10.8.10.29@o2ib6 remote: 0xe520aafcd65657d3 expref: 75 pid: 102611 timeout: 165104 lvb_type: 0 Apr 30 08:54:51 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.8.10.29@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Apr 30 08:54:51 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 155s: evicting client at 10.8.10.29@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff9835c2218240/0xce885384746ac067 lrc: 3/0,0 mode: PR/PR res: [0x20001e803:0x12c:0x0].0x0 bits 0x13/0x0 rrc: 41 type: IBT flags: 0x60200400000020 nid: 10.8.10.29@o2ib6 remote: 0xe520aafcd65657d3 expref: 76 pid: 102611 timeout: 0 lvb_type: 0 Apr 30 08:54:51 fir-md1-s1 kernel: Lustre: 102532:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (147:1s); client may timeout. req@ffff982b96b5b600 x1631695967233984/t0(0) o101->acb1aa3b-60ab-7f7c-ec38-03838117cd24@10.8.25.12@o2ib6:13/0 lens 592/536 e 1 to 0 dl 1556639690 ref 1 fl Complete:/0/0 rc 0/0 Apr 30 08:54:51 fir-md1-s1 kernel: Lustre: 102532:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Apr 30 08:54:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client c61a7e99-2daf-0d4e-4d5e-683948263561 (at 10.8.12.33@o2ib6) in 158 seconds. I think it's dead, and I am evicting it. exp ffff984bfeb77c00, cur 1556639692 expire 1556639542 last 1556639534 Apr 30 08:54:52 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 30 08:57:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a3a1deb5-f17f-91e3-ffe6-f046415da924 (at 10.8.12.33@o2ib6) Apr 30 08:57:02 fir-md1-s1 kernel: Lustre: Skipped 133 previous similar messages Apr 30 08:58:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e8c62915-6c71-4c14-7626-f3601bfc0f0f (at 10.8.1.3@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985953787400, cur 1556639891 expire 1556639741 last 1556639664 Apr 30 08:58:11 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Apr 30 09:00:49 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client c3ea7cff-a2b9-6fc8-d2a8-f427f845bf09 (at 10.8.12.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985cf2eb2c00, cur 1556640049 expire 1556639899 last 1556639822 Apr 30 09:00:49 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 30 09:05:47 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client d7a74b87-bc65-826f-a750-c8a80cc8482a (at 10.8.12.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985cd48f0c00, cur 1556640347 expire 1556640197 last 1556640120 Apr 30 09:05:47 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Apr 30 09:27:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a3a1deb5-f17f-91e3-ffe6-f046415da924 (at 10.8.12.33@o2ib6) Apr 30 09:27:10 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Apr 30 09:28:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b0015eb5-6efa-a3bc-bfd9-109e877d2725 (at 10.8.11.7@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9859f2201000, cur 1556641731 expire 1556641581 last 1556641504 Apr 30 09:28:51 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 30 09:29:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.1.3@o2ib6) Apr 30 09:29:05 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 30 09:30:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 3eee9c96-c1b6-76d2-95c6-7eafc5882ffc (at 10.8.12.33@o2ib6) in 174 seconds. I think it's dead, and I am evicting it. exp ffff983c4e38b000, cur 1556641807 expire 1556641657 last 1556641633 Apr 30 09:30:07 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Apr 30 09:34:40 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 5f3e56aa-c133-8c57-1770-33b09b5d7956 (at 10.8.12.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984c02f49400, cur 1556642080 expire 1556641930 last 1556641853 Apr 30 09:34:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 30 10:02:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e80f6b46-7bcd-30a8-8491-3102d8ee0aa0 (at 10.8.25.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984709677000, cur 1556643773 expire 1556643623 last 1556643546 Apr 30 10:02:53 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 30 10:02:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 296150f1-2afd-83d5-ffa7-39277d8476e6 (at 10.8.25.9@o2ib6) Apr 30 10:02:59 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Apr 30 11:40:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 6f8d180c-697b-87fe-2c39-64e5c1d542ef (at 10.8.26.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984cf8671800, cur 1556649642 expire 1556649492 last 1556649415 Apr 30 11:40:42 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 30 11:41:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 627e76d3-c7eb-8d97-a64e-cb771e022cc0 (at 10.8.26.33@o2ib6) Apr 30 11:41:17 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 30 11:54:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e4faccdb-f303-9bdd-51a6-ad7a646ae559 (at 10.8.26.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982cb4a1a400, cur 1556650484 expire 1556650334 last 1556650257 Apr 30 11:54:44 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 30 11:54:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client e4faccdb-f303-9bdd-51a6-ad7a646ae559 (at 10.8.26.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98375420c400, cur 1556650498 expire 1556650348 last 1556650271 Apr 30 11:54:58 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 30 11:55:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 627e76d3-c7eb-8d97-a64e-cb771e022cc0 (at 10.8.26.33@o2ib6) Apr 30 11:55:35 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 30 12:09:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 4b625669-b570-bf89-cdc9-b22aede67358 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982cd197ec00, cur 1556651382 expire 1556651232 last 1556651155 Apr 30 12:12:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.10.29@o2ib6) Apr 30 12:12:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 30 12:20:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 327c28a1-fe51-b704-2bde-95368e501f01 (at 10.8.1.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985cfb382c00, cur 1556652015 expire 1556651865 last 1556651788 Apr 30 12:20:15 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 30 12:22:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.1.29@o2ib6) Apr 30 12:22:08 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 30 12:24:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 459dc0b2-5f7f-24eb-f6a1-6e1030b48b5c (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9858c6edb000, cur 1556652254 expire 1556652104 last 1556652027 Apr 30 12:24:14 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 30 12:24:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 459dc0b2-5f7f-24eb-f6a1-6e1030b48b5c (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983cf7852800, cur 1556652256 expire 1556652106 last 1556652029 Apr 30 12:32:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.10.29@o2ib6) Apr 30 12:32:36 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 30 12:41:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client acdb8e1f-3ab2-f130-36a6-60883f4fd9c7 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983cc6f8c400, cur 1556653262 expire 1556653112 last 1556653035 Apr 30 12:41:02 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 30 12:42:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.10.29@o2ib6) Apr 30 12:42:03 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 30 13:03:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f4319f02-25fa-20b6-a648-2516dd1744d4 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983c4af49800, cur 1556654628 expire 1556654478 last 1556654401 Apr 30 13:03:48 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 30 13:06:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.10.29@o2ib6) Apr 30 13:06:28 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 30 13:12:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 2df8c138-8c23-ea09-17c8-c9239f9279b9 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982b28a92800, cur 1556655144 expire 1556654994 last 1556654917 Apr 30 13:12:24 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 30 13:14:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.13.24@o2ib6) Apr 30 13:14:07 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 30 13:14:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 919649d8-704c-889b-d1dd-a296af8855ee (at 10.8.13.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985c307d1800, cur 1556655298 expire 1556655148 last 1556655071 Apr 30 13:14:58 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 30 13:37:58 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.10.29@o2ib6) Apr 30 13:37:58 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 30 13:51:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d9eb9aeb-03ec-18ce-b78d-5769086dc54d (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985c3ca43400, cur 1556657485 expire 1556657335 last 1556657258 Apr 30 13:51:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 30 13:54:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.10.29@o2ib6) Apr 30 13:54:03 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 30 14:02:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 4115a52d-9eff-7ac8-6fc7-05e10e61ece9 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983c3b71ec00, cur 1556658174 expire 1556658024 last 1556657947 Apr 30 14:02:54 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 30 14:06:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.10.29@o2ib6) Apr 30 14:06:47 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 30 14:16:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client cb2147ca-63ad-9be6-4549-5a3714e7a68f (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983bb7820400, cur 1556659013 expire 1556658863 last 1556658786 Apr 30 14:16:53 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 30 14:21:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.10.29@o2ib6) Apr 30 14:21:21 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 30 14:32:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 49903e14-e267-be84-fd5c-bda9815f9fe4 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984be9741000, cur 1556659963 expire 1556659813 last 1556659736 Apr 30 14:32:43 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 30 14:36:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.10.29@o2ib6) Apr 30 14:36:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 30 15:12:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client ec01363a-b910-254e-075d-e7f3e6df1606 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98376638dc00, cur 1556662335 expire 1556662185 last 1556662108 Apr 30 15:12:15 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 30 15:12:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client ec01363a-b910-254e-075d-e7f3e6df1606 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984ca8a6c800, cur 1556662350 expire 1556662200 last 1556662123 Apr 30 15:12:30 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 30 15:17:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.10.29@o2ib6) Apr 30 15:17:38 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 30 15:17:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.8.10.29@o2ib6) Apr 30 15:17:39 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 30 15:39:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a078cd0f-7e7e-03be-ddc4-775ce28fae96 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9859b7ac6c00, cur 1556663943 expire 1556663793 last 1556663716 Apr 30 15:39:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client a078cd0f-7e7e-03be-ddc4-775ce28fae96 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985cf8216800, cur 1556663955 expire 1556663805 last 1556663728 Apr 30 15:39:15 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 30 15:39:39 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 93524e03-763b-9556-15d0-9c57c97e51dd (at 10.8.21.21@o2ib6) Apr 30 15:39:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 93524e03-763b-9556-15d0-9c57c97e51dd (at 10.8.21.21@o2ib6) Apr 30 15:39:39 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 30 16:19:42 fir-md1-s1 kernel: LNetError: 101314:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Apr 30 16:20:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 1113817e-df5c-b4f9-94ce-f33b62a1b499 (at 10.8.8.26@o2ib6) reconnecting Apr 30 16:20:13 fir-md1-s1 kernel: Lustre: Skipped 66 previous similar messages Apr 30 16:20:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 520279e1-9bd2-c069-cb21-1e3d5370b323 (at 10.8.8.26@o2ib6) Apr 30 16:36:39 fir-md1-s1 kernel: LNetError: 101320:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Apr 30 16:36:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 7cc0f019-7fa6-17b1-76f1-8ecb3c84ba82 (at 10.8.27.24@o2ib6) reconnecting Apr 30 16:36:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.8.27.24@o2ib6) Apr 30 16:42:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 1d5157f9-7efa-61ef-c6f9-b1db29ae7243 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984a85b69800, cur 1556667771 expire 1556667621 last 1556667544 Apr 30 16:43:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 93524e03-763b-9556-15d0-9c57c97e51dd (at 10.8.21.21@o2ib6) Apr 30 16:46:46 fir-md1-s1 kernel: LNetError: 101310:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Apr 30 16:47:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 7cc0f019-7fa6-17b1-76f1-8ecb3c84ba82 (at 10.8.27.24@o2ib6) reconnecting Apr 30 16:47:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.8.27.24@o2ib6) Apr 30 16:47:17 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 30 17:05:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 53385b7b-a550-b1a8-0abe-3b8ac836eb95 (at 10.8.10.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985cb6f23800, cur 1556669152 expire 1556669002 last 1556668925 Apr 30 17:05:52 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 30 17:08:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.10.20@o2ib6) Apr 30 22:33:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 6fab2fb5-26e6-7b9a-b3d9-fd518701970b (at 10.8.14.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98465aa59000, cur 1556688803 expire 1556688653 last 1556688576 Apr 30 22:33:23 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 01 08:20:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 6b2f9741-e509-4243-058a-e7872e15cb5c (at 10.8.1.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984cf8677c00, cur 1556724020 expire 1556723870 last 1556723793 May 01 08:20:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 01 08:22:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.1.31@o2ib6) May 01 08:22:30 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 01 08:39:12 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d64a3116-c9b2-082e-3250-a9e5dffa1cb1 (at 10.8.14.5@o2ib6) May 01 08:39:12 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 01 08:39:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to cc6b3c33-4b63-0ce5-d300-e50e75e32d79 (at 10.8.13.23@o2ib6) May 01 08:39:26 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 01 08:39:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.13.24@o2ib6) May 01 08:39:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 01 08:55:32 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a3a1deb5-f17f-91e3-ffe6-f046415da924 (at 10.8.12.33@o2ib6) May 01 08:55:32 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 01 10:10:43 fir-md1-s1 kernel: Lustre: 102595:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556730636/real 1556730636] req@ffff9838e8673c00 x1632253887884704/t0(0) o104->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556730643 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 01 10:10:43 fir-md1-s1 kernel: Lustre: 102595:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 7 previous similar messages May 01 10:10:51 fir-md1-s1 kernel: Lustre: 102371:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff983cd98bc800 x1631565217852128/t0(0) o101->7cc0f019-7fa6-17b1-76f1-8ecb3c84ba82@10.8.27.24@o2ib6:26/0 lens 1800/3288 e 1 to 0 dl 1556730656 ref 2 fl Interpret:/0/0 rc 0/0 May 01 10:10:51 fir-md1-s1 kernel: Lustre: 102371:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 11 previous similar messages May 01 10:10:57 fir-md1-s1 kernel: Lustre: 102595:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556730650/real 1556730650] req@ffff9838e8673c00 x1632253887884704/t0(0) o104->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556730657 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 01 10:10:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 7cc0f019-7fa6-17b1-76f1-8ecb3c84ba82 (at 10.8.27.24@o2ib6) reconnecting May 01 10:10:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.8.27.24@o2ib6) May 01 10:10:57 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 01 10:10:57 fir-md1-s1 kernel: Lustre: 102595:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 01 10:11:11 fir-md1-s1 kernel: LustreError: 102595:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.27.23@o2ib6) failed to reply to blocking AST (req@ffff9838e8673c00 x1632253887884704 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff983c72eb6e40/0xce8853875de4f9ed lrc: 4/0,0 mode: PR/PR res: [0x20000560a:0xaeb:0x0].0x0 bits 0x13/0x0 rrc: 7 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0xb5891fec25bff6c1 expref: 157 pid: 102522 timeout: 255965 lvb_type: 0 May 01 10:11:11 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.8.27.23@o2ib6 was evicted due to a lock blocking callback time out: rc -110 May 01 10:11:11 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 35s: evicting client at 10.8.27.23@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff983c72eb6e40/0xce8853875de4f9ed lrc: 3/0,0 mode: PR/PR res: [0x20000560a:0xaeb:0x0].0x0 bits 0x13/0x0 rrc: 7 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0xb5891fec25bff6c1 expref: 158 pid: 102522 timeout: 0 lvb_type: 0 May 01 10:14:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client c002d779-213f-8764-b0ce-a364b557d98d (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985c736d7800, cur 1556730845 expire 1556730695 last 1556730618 May 01 10:14:05 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 01 10:14:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 01 10:44:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 0840d825-5e1d-ab09-748d-b5fef372f47f (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9847076dc800, cur 1556732691 expire 1556732541 last 1556732464 May 01 10:44:51 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 01 10:45:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 93524e03-763b-9556-15d0-9c57c97e51dd (at 10.8.21.21@o2ib6) May 01 10:45:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 01 11:35:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client c7c1e785-7484-b308-7dc5-6b63513d6220 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9849f4ee0400, cur 1556735759 expire 1556735609 last 1556735532 May 01 11:35:59 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 01 11:36:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client c7c1e785-7484-b308-7dc5-6b63513d6220 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983c6a2efc00, cur 1556735764 expire 1556735614 last 1556735537 May 01 11:36:04 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 01 11:36:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 01 11:36:15 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 01 11:55:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 84153925-7318-d597-37bf-61264542eb58 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982aaeb11000, cur 1556736958 expire 1556736808 last 1556736731 May 01 11:56:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 84153925-7318-d597-37bf-61264542eb58 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98387bf7e000, cur 1556736973 expire 1556736823 last 1556736746 May 01 11:56:13 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 01 11:56:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 01 11:56:23 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 01 14:10:36 fir-md1-s1 kernel: LNetError: 101316:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 18:06:52 fir-md1-s1 kernel: Lustre: 102768:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556759205/real 1556759205] req@ffff9823e870f800 x1632254373081920/t0(0) o104->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556759212 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 01 18:06:52 fir-md1-s1 kernel: Lustre: 102768:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 01 18:06:59 fir-md1-s1 kernel: Lustre: 102768:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556759212/real 1556759212] req@ffff9823e870f800 x1632254373081920/t0(0) o104->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556759219 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 01 18:07:06 fir-md1-s1 kernel: Lustre: 102768:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556759219/real 1556759219] req@ffff9823e870f800 x1632254373081920/t0(0) o104->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556759226 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 01 18:07:10 fir-md1-s1 kernel: Lustre: 102546:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff982a2fe51b00 x1631565224469472/t0(0) o101->7cc0f019-7fa6-17b1-76f1-8ecb3c84ba82@10.8.27.24@o2ib6:15/0 lens 1792/3288 e 0 to 0 dl 1556759235 ref 2 fl Interpret:/0/0 rc 0/0 May 01 18:07:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 7cc0f019-7fa6-17b1-76f1-8ecb3c84ba82 (at 10.8.27.24@o2ib6) reconnecting May 01 18:07:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.8.27.24@o2ib6) May 01 18:07:16 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 01 18:07:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client dfe61200-863e-32be-7d68-5233540a9762 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9847f0addc00, cur 1556759237 expire 1556759087 last 1556759010 May 01 18:07:20 fir-md1-s1 kernel: Lustre: 102768:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556759233/real 1556759233] req@ffff9823e870f800 x1632254373081920/t0(0) o104->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556759240 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 01 18:07:20 fir-md1-s1 kernel: Lustre: 102768:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 01 18:07:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client dfe61200-863e-32be-7d68-5233540a9762 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9837f3b95400, cur 1556759250 expire 1556759100 last 1556759023 May 01 18:07:30 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 01 18:07:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 01 18:11:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 7592b62b-cd74-82e4-03cd-75fb5e0a226b (at 10.8.14.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9848d1a6a400, cur 1556759514 expire 1556759364 last 1556759287 May 01 18:11:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 7592b62b-cd74-82e4-03cd-75fb5e0a226b (at 10.8.14.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985cabafdc00, cur 1556759517 expire 1556759367 last 1556759290 May 01 18:37:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 01 18:37:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 01 18:38:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 4cd88215-e667-298c-fb54-c17c8301efbb (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98289c311c00, cur 1556761121 expire 1556760971 last 1556760894 May 01 18:38:41 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 01 18:38:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 4cd88215-e667-298c-fb54-c17c8301efbb (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98363a726800, cur 1556761138 expire 1556760988 last 1556760911 May 01 18:38:58 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 01 18:43:16 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.14.9@o2ib6) May 01 18:43:16 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 01 18:51:21 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client f6dec251-9455-2db5-0c0d-4b1d5c39f7f8 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984ac166f800, cur 1556761881 expire 1556761731 last 1556761654 May 01 18:51:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client dd447c0e-bf16-d0be-6449-bf36e688df99 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984ac176b800, cur 1556761894 expire 1556761744 last 1556761667 May 01 18:51:34 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 01 18:51:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 93524e03-763b-9556-15d0-9c57c97e51dd (at 10.8.21.21@o2ib6) May 01 18:51:55 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 01 19:04:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client fe35d843-20f4-288f-d8db-52dd32b58570 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9847c9778c00, cur 1556762672 expire 1556762522 last 1556762445 May 01 19:04:50 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 93524e03-763b-9556-15d0-9c57c97e51dd (at 10.8.21.21@o2ib6) May 01 19:04:50 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 01 19:08:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 64b566c2-ebb5-7da0-af60-514dba7cee07 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982c15ad5800, cur 1556762933 expire 1556762783 last 1556762706 May 01 19:08:53 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 01 19:09:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 64b566c2-ebb5-7da0-af60-514dba7cee07 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983229267800, cur 1556762955 expire 1556762805 last 1556762728 May 01 19:09:15 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 01 19:09:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 01 19:09:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 01 19:15:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 2acff163-453f-6866-ca52-3be787a802e5 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982b3630f000, cur 1556763345 expire 1556763195 last 1556763118 May 01 19:15:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 01 19:15:52 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 01 19:20:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client ef0ca740-68f8-0d2f-af07-d739c91e59f6 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982b520d3400, cur 1556763647 expire 1556763497 last 1556763420 May 01 19:20:47 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 01 19:21:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 93524e03-763b-9556-15d0-9c57c97e51dd (at 10.8.21.21@o2ib6) May 01 19:21:10 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 01 19:33:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client b169cfff-999c-3e08-edaf-bc412cfb2b0a (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983938efa400, cur 1556764435 expire 1556764285 last 1556764208 May 01 19:33:55 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 01 19:34:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b169cfff-999c-3e08-edaf-bc412cfb2b0a (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983571eb7c00, cur 1556764450 expire 1556764300 last 1556764223 May 01 19:34:10 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 01 19:34:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 01 19:34:21 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 01 21:00:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6667e1fb-9e5d-8122-f716-8d2ca6b880cd (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983222a73800, cur 1556769645 expire 1556769495 last 1556769418 May 01 21:02:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 93524e03-763b-9556-15d0-9c57c97e51dd (at 10.8.21.21@o2ib6) May 01 21:02:38 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 01 21:36:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 5242334c-3a63-f428-27e9-84a9b8569357 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983880323c00, cur 1556771818 expire 1556771668 last 1556771591 May 01 21:36:58 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 01 21:37:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 5242334c-3a63-f428-27e9-84a9b8569357 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983880321000, cur 1556771823 expire 1556771673 last 1556771596 May 01 21:37:03 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 01 21:37:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 93524e03-763b-9556-15d0-9c57c97e51dd (at 10.8.21.21@o2ib6) May 01 21:37:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 01 22:18:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 90fd09f3-1e4c-d89d-b1ef-509c9c50dd06 (at 10.8.9.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985cf57e4800, cur 1556774296 expire 1556774146 last 1556774069 May 01 22:18:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.9.8@o2ib6) May 01 22:18:23 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 01 23:31:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client e2a571ed-a09d-5b66-3666-df63bf8e2019 (at 10.8.10.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985ce3d65c00, cur 1556778698 expire 1556778548 last 1556778471 May 01 23:31:38 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 01 23:40:35 fir-md1-s1 kernel: Lustre: 102710:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 May 01 23:42:05 fir-md1-s1 kernel: list passed to list_sort() too long for efficiency May 01 23:42:19 fir-md1-s1 kernel: Lustre: 101920:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9821b459bf00 x1632078159890016/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:23/0 lens 600/3264 e 1 to 0 dl 1556779343 ref 2 fl Interpret:/0/0 rc 0/0 May 01 23:42:21 fir-md1-s1 kernel: Lustre: 102970:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff983a39ba1050 x1631585929788784/t0(0) o4->16749711-2a27-479b-83fc-14b2199ba6af@10.9.104.18@o2ib4:26/0 lens 8680/448 e 1 to 0 dl 1556779346 ref 2 fl Interpret:/0/0 rc 0/0 May 01 23:42:23 fir-md1-s1 kernel: Lustre: 102763:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff983180f6c800 x1631534633692496/t0(0) o101->f3bba4e8-9568-4001-6257-88537741a8c9@10.8.29.3@o2ib6:28/0 lens 1768/3288 e 1 to 0 dl 1556779348 ref 2 fl Interpret:/0/0 rc 0/0 May 01 23:42:23 fir-md1-s1 kernel: Lustre: 102763:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages May 01 23:42:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 01 23:42:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.0.10.3@o2ib7) May 01 23:42:24 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 01 23:42:25 fir-md1-s1 kernel: Lustre: 102778:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9831f9a69b00 x1631604149634912/t0(0) o101->fir-MDT0000-lwp-OST002e_UUID@10.0.10.107@o2ib7:0/0 lens 456/496 e 1 to 0 dl 1556779350 ref 2 fl Interpret:/0/0 rc 0/0 May 01 23:42:25 fir-md1-s1 kernel: Lustre: 102778:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 67 previous similar messages May 01 23:42:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 16749711-2a27-479b-83fc-14b2199ba6af (at 10.9.104.18@o2ib4) reconnecting May 01 23:42:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.9.104.18@o2ib4) May 01 23:42:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client d4242da5-5a9c-4508-f9da-c1e7f36347f4 (at 10.9.114.4@o2ib4) reconnecting May 01 23:42:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.9.114.4@o2ib4) May 01 23:42:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Received new LWP connection from 10.0.10.52@o2ib7, removing former export from same NID May 01 23:42:29 fir-md1-s1 kernel: Lustre: 102822:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9857f819c200 x1631604139115328/t0(0) o101->fir-MDT0000-lwp-OST0016_UUID@10.0.10.103@o2ib7:4/0 lens 456/496 e 1 to 0 dl 1556779354 ref 2 fl Interpret:/0/0 rc 0/0 May 01 23:42:29 fir-md1-s1 kernel: Lustre: 102822:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 45 previous similar messages May 01 23:42:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client fir-MDT0000-lwp-OST0026_UUID (at 10.0.10.107@o2ib7) reconnecting May 01 23:42:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to d3dcd8ee-7913-062f-8514-9178ef53d789 (at 10.0.10.107@o2ib7) May 01 23:42:31 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages May 01 23:42:31 fir-md1-s1 kernel: Lustre: Skipped 30 previous similar messages May 01 23:42:31 fir-md1-s1 kernel: Lustre: 102963:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556779330/real 1556779330] req@ffff98239efdb900 x1632254603866608/t0(0) o601->fir-MDT0000-lwp-MDT0002@0@lo:23/10 lens 336/336 e 1 to 1 dl 1556779351 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1 May 01 23:42:31 fir-md1-s1 kernel: Lustre: 102963:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 01 23:42:31 fir-md1-s1 kernel: Lustre: fir-MDT0000-lwp-MDT0002: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete May 01 23:42:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Received new LWP connection from 0@lo, removing former export from same NID May 01 23:42:36 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#4 stuck for 23s! [mdt_io00_057:103101] May 01 23:42:36 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#9 stuck for 23s! [mdt_io01_029:102923] May 01 23:42:36 fir-md1-s1 kernel: Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) May 01 23:42:36 fir-md1-s1 kernel: Modules linked in: May 01 23:42:36 fir-md1-s1 kernel: osp(OE) May 01 23:42:36 fir-md1-s1 kernel: mdd(OE) May 01 23:42:36 fir-md1-s1 kernel: lod(OE) May 01 23:42:36 fir-md1-s1 kernel: mdt(OE) May 01 23:42:36 fir-md1-s1 kernel: lfsck(OE) May 01 23:42:36 fir-md1-s1 kernel: mgs(OE) May 01 23:42:36 fir-md1-s1 kernel: mgc(OE) May 01 23:42:36 fir-md1-s1 kernel: osd_ldiskfs(OE) May 01 23:42:36 fir-md1-s1 kernel: lquota(OE) May 01 23:42:36 fir-md1-s1 kernel: ldiskfs(OE) May 01 23:42:36 fir-md1-s1 kernel: lustre(OE) May 01 23:42:36 fir-md1-s1 kernel: lmv(OE) May 01 23:42:36 fir-md1-s1 kernel: mdc(OE) May 01 23:42:36 fir-md1-s1 kernel: osc(OE) May 01 23:42:36 fir-md1-s1 kernel: lov(OE) May 01 23:42:36 fir-md1-s1 kernel: fid(OE) May 01 23:42:36 fir-md1-s1 kernel: fld(OE) May 01 23:42:36 fir-md1-s1 kernel: ko2iblnd(OE) May 01 23:42:36 fir-md1-s1 kernel: ptlrpc(OE) May 01 23:42:36 fir-md1-s1 kernel: obdclass(OE) May 01 23:42:36 fir-md1-s1 kernel: lnet(OE) May 01 23:42:36 fir-md1-s1 kernel: libcfs(OE) May 01 23:42:36 fir-md1-s1 kernel: rpcsec_gss_krb5 May 01 23:42:36 fir-md1-s1 kernel: auth_rpcgss May 01 23:42:36 fir-md1-s1 kernel: nfsv4 May 01 23:42:36 fir-md1-s1 kernel: dns_resolver May 01 23:42:36 fir-md1-s1 kernel: nfs May 01 23:42:36 fir-md1-s1 kernel: lockd May 01 23:42:36 fir-md1-s1 kernel: grace May 01 23:42:36 fir-md1-s1 kernel: fscache May 01 23:42:36 fir-md1-s1 kernel: rdma_ucm(OE) May 01 23:42:36 fir-md1-s1 kernel: ib_ucm(OE) May 01 23:42:36 fir-md1-s1 kernel: rdma_cm(OE) May 01 23:42:36 fir-md1-s1 kernel: iw_cm(OE) May 01 23:42:36 fir-md1-s1 kernel: ib_ipoib(OE) May 01 23:42:36 fir-md1-s1 kernel: ib_cm(OE) May 01 23:42:36 fir-md1-s1 kernel: ib_umad(OE) May 01 23:42:36 fir-md1-s1 kernel: mlx5_fpga_tools(OE) May 01 23:42:36 fir-md1-s1 kernel: mlx4_en(OE) May 01 23:42:36 fir-md1-s1 kernel: mlx4_ib(OE) May 01 23:42:36 fir-md1-s1 kernel: mlx4_core(OE) May 01 23:42:36 fir-md1-s1 kernel: dell_rbu May 01 23:42:36 fir-md1-s1 kernel: sunrpc May 01 23:42:36 fir-md1-s1 kernel: vfat May 01 23:42:36 fir-md1-s1 kernel: fat May 01 23:42:36 fir-md1-s1 kernel: dm_round_robin May 01 23:42:36 fir-md1-s1 kernel: amd64_edac_mod May 01 23:42:36 fir-md1-s1 kernel: edac_mce_amd May 01 23:42:36 fir-md1-s1 kernel: kvm_amd May 01 23:42:36 fir-md1-s1 kernel: kvm May 01 23:42:36 fir-md1-s1 kernel: ses May 01 23:42:36 fir-md1-s1 kernel: irqbypass May 01 23:42:36 fir-md1-s1 kernel: crc32_pclmul May 01 23:42:36 fir-md1-s1 kernel: enclosure May 01 23:42:36 fir-md1-s1 kernel: ghash_clmulni_intel May 01 23:42:36 fir-md1-s1 kernel: dcdbas May 01 23:42:36 fir-md1-s1 kernel: aesni_intel May 01 23:42:36 fir-md1-s1 kernel: lrw May 01 23:42:36 fir-md1-s1 kernel: gf128mul May 01 23:42:36 fir-md1-s1 kernel: glue_helper May 01 23:42:36 fir-md1-s1 kernel: ablk_helper May 01 23:42:36 fir-md1-s1 kernel: cryptd May 01 23:42:36 fir-md1-s1 kernel: ipmi_si May 01 23:42:36 fir-md1-s1 kernel: pcspkr May 01 23:42:36 fir-md1-s1 kernel: ipmi_devintf May 01 23:42:36 fir-md1-s1 kernel: ccp May 01 23:42:36 fir-md1-s1 kernel: i2c_piix4 May 01 23:42:36 fir-md1-s1 kernel: dm_multipath May 01 23:42:36 fir-md1-s1 kernel: sg May 01 23:42:36 fir-md1-s1 kernel: k10temp May 01 23:42:36 fir-md1-s1 kernel: ipmi_msghandler May 01 23:42:36 fir-md1-s1 kernel: dm_mod May 01 23:42:36 fir-md1-s1 kernel: acpi_power_meter May 01 23:42:36 fir-md1-s1 kernel: knem(OE) May 01 23:42:36 fir-md1-s1 kernel: ip_tables May 01 23:42:36 fir-md1-s1 kernel: ext4 May 01 23:42:36 fir-md1-s1 kernel: mbcache May 01 23:42:36 fir-md1-s1 kernel: jbd2 May 01 23:42:36 fir-md1-s1 kernel: sd_mod May 01 23:42:36 fir-md1-s1 kernel: crc_t10dif May 01 23:42:36 fir-md1-s1 kernel: crct10dif_generic May 01 23:42:36 fir-md1-s1 kernel: mlx5_ib(OE) May 01 23:42:36 fir-md1-s1 kernel: ib_uverbs(OE) May 01 23:42:36 fir-md1-s1 kernel: ib_core(OE) May 01 23:42:36 fir-md1-s1 kernel: i2c_algo_bit May 01 23:42:36 fir-md1-s1 kernel: drm_kms_helper May 01 23:42:36 fir-md1-s1 kernel: mlx5_core(OE) May 01 23:42:36 fir-md1-s1 kernel: syscopyarea May 01 23:42:36 fir-md1-s1 kernel: sysfillrect May 01 23:42:36 fir-md1-s1 kernel: sysimgblt May 01 23:42:36 fir-md1-s1 kernel: fb_sys_fops May 01 23:42:36 fir-md1-s1 kernel: mlxfw(OE) May 01 23:42:36 fir-md1-s1 kernel: crct10dif_pclmul May 01 23:42:36 fir-md1-s1 kernel: ttm May 01 23:42:36 fir-md1-s1 kernel: devlink May 01 23:42:36 fir-md1-s1 kernel: ahci May 01 23:42:36 fir-md1-s1 kernel: crct10dif_common May 01 23:42:36 fir-md1-s1 kernel: libahci May 01 23:42:36 fir-md1-s1 kernel: drm May 01 23:42:36 fir-md1-s1 kernel: mlx_compat(OE) May 01 23:42:36 fir-md1-s1 kernel: tg3 May 01 23:42:36 fir-md1-s1 kernel: crc32c_intel May 01 23:42:36 fir-md1-s1 kernel: libata May 01 23:42:36 fir-md1-s1 kernel: megaraid_sas May 01 23:42:36 fir-md1-s1 kernel: drm_panel_orientation_quirks May 01 23:42:36 fir-md1-s1 kernel: ptp May 01 23:42:36 fir-md1-s1 kernel: pps_core May 01 23:42:36 fir-md1-s1 kernel: mpt3sas(OE) May 01 23:42:36 fir-md1-s1 kernel: raid_class May 01 23:42:36 fir-md1-s1 kernel: scsi_transport_sas May 01 23:42:36 fir-md1-s1 kernel: [last unloaded: libcfs] May 01 23:42:36 fir-md1-s1 kernel: May 01 23:42:36 fir-md1-s1 kernel: CPU: 9 PID: 102923 Comm: mdt_io01_029 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 May 01 23:42:36 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 May 01 23:42:36 fir-md1-s1 kernel: task: ffff985c7cfbd140 ti: ffff985cbe4d8000 task.ti: ffff985cbe4d8000 May 01 23:42:36 fir-md1-s1 kernel: RIP: 0010:[] May 01 23:42:36 fir-md1-s1 kernel: [] native_queued_spin_lock_slowpath+0x15e/0x200 May 01 23:42:36 fir-md1-s1 kernel: RSP: 0018:ffff985cbe4db800 EFLAGS: 00000212 May 01 23:42:36 fir-md1-s1 kernel: RAX: 0000000000000101 RBX: ffff983165105ac0 RCX: 0000000000490000 May 01 23:42:36 fir-md1-s1 kernel: RDX: 0000000000110101 RSI: 0000000000000101 RDI: ffff982c9fc8c480 May 01 23:42:36 fir-md1-s1 kernel: RBP: ffff985cbe4db800 R08: ffff983cff69b780 R09: 0000000000000000 May 01 23:42:36 fir-md1-s1 kernel: R10: ffff983cff69f140 R11: ffffde3f18cce000 R12: 0000000000000000 May 01 23:42:36 fir-md1-s1 kernel: R13: ffff985cbe4db7a0 R14: ffff983165105830 R15: 0000000000000000 May 01 23:42:36 fir-md1-s1 kernel: FS: 00007fddcbed4880(0000) GS:ffff983cff680000(0000) knlGS:0000000000000000 May 01 23:42:36 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 01 23:42:36 fir-md1-s1 kernel: CR2: 00007f50e83bb000 CR3: 00000015323e8000 CR4: 00000000003407e0 May 01 23:42:36 fir-md1-s1 kernel: Call Trace: May 01 23:42:36 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf May 01 23:42:36 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 May 01 23:42:36 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] May 01 23:42:36 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] May 01 23:42:36 fir-md1-s1 kernel: [] ? ktime_get+0x52/0xe0 May 01 23:42:36 fir-md1-s1 kernel: [] ? kiblnd_check_sends_locked+0xa72/0xe40 [ko2iblnd] May 01 23:42:36 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] May 01 23:42:36 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 May 01 23:42:36 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] May 01 23:42:36 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] May 01 23:42:36 fir-md1-s1 kernel: [] mdt_obd_preprw+0x637/0x1060 [mdt] May 01 23:42:36 fir-md1-s1 kernel: [] tgt_brw_write+0xc7e/0x1a90 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? lustre_msg_buf_v2+0x1b0/0x1b0 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? lustre_msg_buf+0x17/0x60 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? update_curr+0x14c/0x1e0 May 01 23:42:36 fir-md1-s1 kernel: [] ? account_entity_dequeue+0xae/0xd0 May 01 23:42:36 fir-md1-s1 kernel: [] ? __enqueue_entity+0x78/0x80 May 01 23:42:36 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] May 01 23:42:36 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? default_wake_function+0x12/0x20 May 01 23:42:36 fir-md1-s1 kernel: [] ? __wake_up_common+0x5b/0x90 May 01 23:42:36 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 May 01 23:42:36 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:42:36 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 May 01 23:42:36 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:42:36 fir-md1-s1 kernel: Code: May 01 23:42:36 fir-md1-s1 kernel: 0f May 01 23:42:36 fir-md1-s1 kernel: 18 May 01 23:42:36 fir-md1-s1 kernel: 09 May 01 23:42:36 fir-md1-s1 kernel: 8b May 01 23:42:36 fir-md1-s1 kernel: 17 May 01 23:42:36 fir-md1-s1 kernel: 0f May 01 23:42:36 fir-md1-s1 kernel: b7 May 01 23:42:36 fir-md1-s1 kernel: c2 May 01 23:42:36 fir-md1-s1 kernel: 85 May 01 23:42:36 fir-md1-s1 kernel: c0 May 01 23:42:36 fir-md1-s1 kernel: 74 May 01 23:42:36 fir-md1-s1 kernel: 21 May 01 23:42:36 fir-md1-s1 kernel: 83 May 01 23:42:36 fir-md1-s1 kernel: f8 May 01 23:42:36 fir-md1-s1 kernel: 03 May 01 23:42:36 fir-md1-s1 kernel: 75 May 01 23:42:36 fir-md1-s1 kernel: 10 May 01 23:42:36 fir-md1-s1 kernel: eb May 01 23:42:36 fir-md1-s1 kernel: 1a May 01 23:42:36 fir-md1-s1 kernel: 66 May 01 23:42:36 fir-md1-s1 kernel: 2e May 01 23:42:36 fir-md1-s1 kernel: 0f May 01 23:42:36 fir-md1-s1 kernel: 1f May 01 23:42:36 fir-md1-s1 kernel: 84 May 01 23:42:36 fir-md1-s1 kernel: 00 May 01 23:42:36 fir-md1-s1 kernel: 00 May 01 23:42:36 fir-md1-s1 kernel: 00 May 01 23:42:36 fir-md1-s1 kernel: 00 May 01 23:42:36 fir-md1-s1 kernel: 00 May 01 23:42:36 fir-md1-s1 kernel: 85 May 01 23:42:36 fir-md1-s1 kernel: c0 May 01 23:42:36 fir-md1-s1 kernel: 74 May 01 23:42:36 fir-md1-s1 kernel: 0c May 01 23:42:36 fir-md1-s1 kernel: f3 May 01 23:42:36 fir-md1-s1 kernel: 90 May 01 23:42:36 fir-md1-s1 kernel: 8b May 01 23:42:36 fir-md1-s1 kernel: 17 May 01 23:42:36 fir-md1-s1 kernel: 0f May 01 23:42:36 fir-md1-s1 kernel: b7 May 01 23:42:36 fir-md1-s1 kernel: c2 May 01 23:42:36 fir-md1-s1 kernel: 83 May 01 23:42:36 fir-md1-s1 kernel: f8 May 01 23:42:36 fir-md1-s1 kernel: 03 May 01 23:42:36 fir-md1-s1 kernel: <75> May 01 23:42:36 fir-md1-s1 kernel: f0 May 01 23:42:36 fir-md1-s1 kernel: be May 01 23:42:36 fir-md1-s1 kernel: 01 May 01 23:42:36 fir-md1-s1 kernel: 00 May 01 23:42:36 fir-md1-s1 kernel: 00 May 01 23:42:36 fir-md1-s1 kernel: 00 May 01 23:42:36 fir-md1-s1 kernel: eb May 01 23:42:36 fir-md1-s1 kernel: 15 May 01 23:42:36 fir-md1-s1 kernel: 66 May 01 23:42:36 fir-md1-s1 kernel: 0f May 01 23:42:36 fir-md1-s1 kernel: 1f May 01 23:42:36 fir-md1-s1 kernel: 84 May 01 23:42:36 fir-md1-s1 kernel: 00 May 01 23:42:36 fir-md1-s1 kernel: 00 May 01 23:42:36 fir-md1-s1 kernel: 00 May 01 23:42:36 fir-md1-s1 kernel: 00 May 01 23:42:36 fir-md1-s1 kernel: 00 May 01 23:42:36 fir-md1-s1 kernel: 89 May 01 23:42:36 fir-md1-s1 kernel: d0 May 01 23:42:36 fir-md1-s1 kernel: f0 May 01 23:42:36 fir-md1-s1 kernel: May 01 23:42:36 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [mdt_io00_073:103263] May 01 23:42:36 fir-md1-s1 kernel: Modules linked in: May 01 23:42:36 fir-md1-s1 kernel: osp(OE) May 01 23:42:36 fir-md1-s1 kernel: mdd(OE) May 01 23:42:36 fir-md1-s1 kernel: lod(OE) May 01 23:42:36 fir-md1-s1 kernel: mdt(OE) May 01 23:42:36 fir-md1-s1 kernel: lfsck(OE) May 01 23:42:36 fir-md1-s1 kernel: mgs(OE) May 01 23:42:36 fir-md1-s1 kernel: mgc(OE) May 01 23:42:36 fir-md1-s1 kernel: osd_ldiskfs(OE) May 01 23:42:36 fir-md1-s1 kernel: lquota(OE) May 01 23:42:36 fir-md1-s1 kernel: ldiskfs(OE) May 01 23:42:36 fir-md1-s1 kernel: lustre(OE) May 01 23:42:36 fir-md1-s1 kernel: lmv(OE) May 01 23:42:36 fir-md1-s1 kernel: mdc(OE) May 01 23:42:36 fir-md1-s1 kernel: osc(OE) May 01 23:42:36 fir-md1-s1 kernel: lov(OE) May 01 23:42:36 fir-md1-s1 kernel: fid(OE) May 01 23:42:36 fir-md1-s1 kernel: fld(OE) May 01 23:42:36 fir-md1-s1 kernel: ko2iblnd(OE) May 01 23:42:36 fir-md1-s1 kernel: ptlrpc(OE) May 01 23:42:36 fir-md1-s1 kernel: obdclass(OE) May 01 23:42:36 fir-md1-s1 kernel: lnet(OE) May 01 23:42:36 fir-md1-s1 kernel: libcfs(OE) May 01 23:42:36 fir-md1-s1 kernel: rpcsec_gss_krb5 May 01 23:42:36 fir-md1-s1 kernel: auth_rpcgss May 01 23:42:36 fir-md1-s1 kernel: nfsv4 May 01 23:42:36 fir-md1-s1 kernel: dns_resolver May 01 23:42:36 fir-md1-s1 kernel: nfs May 01 23:42:36 fir-md1-s1 kernel: lockd May 01 23:42:36 fir-md1-s1 kernel: grace May 01 23:42:36 fir-md1-s1 kernel: fscache May 01 23:42:36 fir-md1-s1 kernel: rdma_ucm(OE) May 01 23:42:36 fir-md1-s1 kernel: ib_ucm(OE) May 01 23:42:36 fir-md1-s1 kernel: rdma_cm(OE) May 01 23:42:36 fir-md1-s1 kernel: iw_cm(OE) May 01 23:42:36 fir-md1-s1 kernel: ib_ipoib(OE) May 01 23:42:36 fir-md1-s1 kernel: ib_cm(OE) May 01 23:42:36 fir-md1-s1 kernel: ib_umad(OE) May 01 23:42:36 fir-md1-s1 kernel: mlx5_fpga_tools(OE) May 01 23:42:36 fir-md1-s1 kernel: mlx4_en(OE) May 01 23:42:36 fir-md1-s1 kernel: mlx4_ib(OE) May 01 23:42:36 fir-md1-s1 kernel: mlx4_core(OE) May 01 23:42:36 fir-md1-s1 kernel: dell_rbu May 01 23:42:36 fir-md1-s1 kernel: sunrpc May 01 23:42:36 fir-md1-s1 kernel: vfat May 01 23:42:36 fir-md1-s1 kernel: fat May 01 23:42:36 fir-md1-s1 kernel: dm_round_robin May 01 23:42:36 fir-md1-s1 kernel: amd64_edac_mod May 01 23:42:36 fir-md1-s1 kernel: edac_mce_amd May 01 23:42:36 fir-md1-s1 kernel: kvm_amd May 01 23:42:36 fir-md1-s1 kernel: kvm May 01 23:42:36 fir-md1-s1 kernel: ses May 01 23:42:36 fir-md1-s1 kernel: irqbypass May 01 23:42:36 fir-md1-s1 kernel: crc32_pclmul May 01 23:42:36 fir-md1-s1 kernel: enclosure May 01 23:42:36 fir-md1-s1 kernel: ghash_clmulni_intel May 01 23:42:36 fir-md1-s1 kernel: dcdbas May 01 23:42:36 fir-md1-s1 kernel: aesni_intel May 01 23:42:36 fir-md1-s1 kernel: lrw May 01 23:42:36 fir-md1-s1 kernel: gf128mul May 01 23:42:36 fir-md1-s1 kernel: glue_helper May 01 23:42:36 fir-md1-s1 kernel: ablk_helper May 01 23:42:36 fir-md1-s1 kernel: cryptd May 01 23:42:36 fir-md1-s1 kernel: ipmi_si May 01 23:42:36 fir-md1-s1 kernel: pcspkr May 01 23:42:36 fir-md1-s1 kernel: ipmi_devintf May 01 23:42:36 fir-md1-s1 kernel: ccp May 01 23:42:36 fir-md1-s1 kernel: i2c_piix4 May 01 23:42:36 fir-md1-s1 kernel: dm_multipath May 01 23:42:36 fir-md1-s1 kernel: sg May 01 23:42:36 fir-md1-s1 kernel: k10temp May 01 23:42:36 fir-md1-s1 kernel: ipmi_msghandler May 01 23:42:36 fir-md1-s1 kernel: dm_mod May 01 23:42:36 fir-md1-s1 kernel: acpi_power_meter May 01 23:42:36 fir-md1-s1 kernel: knem(OE) May 01 23:42:36 fir-md1-s1 kernel: ip_tables May 01 23:42:36 fir-md1-s1 kernel: ext4 May 01 23:42:36 fir-md1-s1 kernel: mbcache May 01 23:42:36 fir-md1-s1 kernel: jbd2 May 01 23:42:36 fir-md1-s1 kernel: sd_mod May 01 23:42:36 fir-md1-s1 kernel: crc_t10dif May 01 23:42:36 fir-md1-s1 kernel: crct10dif_generic May 01 23:42:36 fir-md1-s1 kernel: mlx5_ib(OE) May 01 23:42:36 fir-md1-s1 kernel: ib_uverbs(OE) May 01 23:42:36 fir-md1-s1 kernel: ib_core(OE) May 01 23:42:36 fir-md1-s1 kernel: i2c_algo_bit May 01 23:42:36 fir-md1-s1 kernel: drm_kms_helper May 01 23:42:36 fir-md1-s1 kernel: mlx5_core(OE) May 01 23:42:36 fir-md1-s1 kernel: syscopyarea May 01 23:42:36 fir-md1-s1 kernel: sysfillrect May 01 23:42:36 fir-md1-s1 kernel: sysimgblt May 01 23:42:36 fir-md1-s1 kernel: fb_sys_fops May 01 23:42:36 fir-md1-s1 kernel: mlxfw(OE) May 01 23:42:36 fir-md1-s1 kernel: crct10dif_pclmul May 01 23:42:36 fir-md1-s1 kernel: ttm May 01 23:42:36 fir-md1-s1 kernel: devlink May 01 23:42:36 fir-md1-s1 kernel: ahci May 01 23:42:36 fir-md1-s1 kernel: crct10dif_common May 01 23:42:36 fir-md1-s1 kernel: libahci May 01 23:42:36 fir-md1-s1 kernel: drm May 01 23:42:36 fir-md1-s1 kernel: mlx_compat(OE) May 01 23:42:36 fir-md1-s1 kernel: tg3 May 01 23:42:36 fir-md1-s1 kernel: crc32c_intel May 01 23:42:36 fir-md1-s1 kernel: libata May 01 23:42:36 fir-md1-s1 kernel: megaraid_sas May 01 23:42:36 fir-md1-s1 kernel: drm_panel_orientation_quirks May 01 23:42:36 fir-md1-s1 kernel: ptp May 01 23:42:36 fir-md1-s1 kernel: pps_core May 01 23:42:36 fir-md1-s1 kernel: mpt3sas(OE) May 01 23:42:36 fir-md1-s1 kernel: raid_class May 01 23:42:36 fir-md1-s1 kernel: scsi_transport_sas May 01 23:42:36 fir-md1-s1 kernel: [last unloaded: libcfs] May 01 23:42:36 fir-md1-s1 kernel: May 01 23:42:36 fir-md1-s1 kernel: CPU: 12 PID: 103263 Comm: mdt_io00_073 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 May 01 23:42:36 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 May 01 23:42:36 fir-md1-s1 kernel: task: ffff984cba23e180 ti: ffff98286afac000 task.ti: ffff98286afac000 May 01 23:42:36 fir-md1-s1 kernel: RIP: 0010:[] May 01 23:42:36 fir-md1-s1 kernel: [] native_queued_spin_lock_slowpath+0x126/0x200 May 01 23:42:36 fir-md1-s1 kernel: RSP: 0018:ffff98286afaf750 EFLAGS: 00000246 May 01 23:42:36 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffff9831739477d8 RCX: 0000000000610000 May 01 23:42:36 fir-md1-s1 kernel: RDX: ffff984cff81b780 RSI: 0000000001110101 RDI: ffff982c9fc8c480 May 01 23:42:36 fir-md1-s1 kernel: RBP: ffff98286afaf750 R08: ffff982cfeedb780 R09: 0000000000000000 May 01 23:42:36 fir-md1-s1 kernel: R10: ffff982cfeedf140 R11: ffffde3edb488200 R12: 0000000000000000 May 01 23:42:36 fir-md1-s1 kernel: R13: ffff98286afaf6f0 R14: ffff983173947548 R15: 0000000000000000 May 01 23:42:36 fir-md1-s1 kernel: FS: 00007fde62083880(0000) GS:ffff982cfeec0000(0000) knlGS:0000000000000000 May 01 23:42:36 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 01 23:42:36 fir-md1-s1 kernel: CR2: 00007f427f58b000 CR3: 000000203caa6000 CR4: 00000000003407e0 May 01 23:42:36 fir-md1-s1 kernel: Call Trace: May 01 23:42:36 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf May 01 23:42:36 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 May 01 23:42:36 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] May 01 23:42:36 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] May 01 23:42:36 fir-md1-s1 kernel: [] ? zone_statistics+0x88/0xa0 May 01 23:42:36 fir-md1-s1 kernel: [] ? qsd_op_begin+0xb1/0x4b0 [lquota] May 01 23:42:36 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] May 01 23:42:36 fir-md1-s1 kernel: [] ? ldiskfs_inode_attach_jinode+0x55/0xd0 [ldiskfs] May 01 23:42:36 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] May 01 23:42:36 fir-md1-s1 kernel: [] osd_write_commit+0x3a2/0x8c0 [osd_ldiskfs] May 01 23:42:36 fir-md1-s1 kernel: [] ? __ldiskfs_journal_start_sb+0x69/0xe0 [ldiskfs] May 01 23:42:36 fir-md1-s1 kernel: [] mdt_commitrw_write.isra.46+0x608/0xd20 [mdt] May 01 23:42:36 fir-md1-s1 kernel: [] mdt_obd_commitrw+0x29b/0x520 [mdt] May 01 23:42:36 fir-md1-s1 kernel: [] obd_commitrw+0x9c/0x370 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] tgt_brw_write+0x100d/0x1a90 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? lustre_msg_buf_v2+0x1b0/0x1b0 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? update_curr+0x14c/0x1e0 May 01 23:42:36 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] May 01 23:42:36 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? default_wake_function+0x12/0x20 May 01 23:42:36 fir-md1-s1 kernel: [] ? __wake_up_common+0x5b/0x90 May 01 23:42:36 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 May 01 23:42:36 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:42:36 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 May 01 23:42:36 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:42:36 fir-md1-s1 kernel: Code: May 01 23:42:36 fir-md1-s1 kernel: 0d May 01 23:42:36 fir-md1-s1 kernel: 48 May 01 23:42:36 fir-md1-s1 kernel: 98 May 01 23:42:36 fir-md1-s1 kernel: 83 May 01 23:42:36 fir-md1-s1 kernel: e2 May 01 23:42:36 fir-md1-s1 kernel: 30 May 01 23:42:36 fir-md1-s1 kernel: 48 May 01 23:42:36 fir-md1-s1 kernel: 81 May 01 23:42:36 fir-md1-s1 kernel: c2 May 01 23:42:36 fir-md1-s1 kernel: 80 May 01 23:42:36 fir-md1-s1 kernel: b7 May 01 23:42:36 fir-md1-s1 kernel: 01 May 01 23:42:36 fir-md1-s1 kernel: 00 May 01 23:42:36 fir-md1-s1 kernel: 48 May 01 23:42:36 fir-md1-s1 kernel: 03 May 01 23:42:36 fir-md1-s1 kernel: 14 May 01 23:42:36 fir-md1-s1 kernel: c5 May 01 23:42:36 fir-md1-s1 kernel: 60 May 01 23:42:36 fir-md1-s1 kernel: b9 May 01 23:42:36 fir-md1-s1 kernel: b4 May 01 23:42:36 fir-md1-s1 kernel: b7 May 01 23:42:36 fir-md1-s1 kernel: 4c May 01 23:42:36 fir-md1-s1 kernel: 89 May 01 23:42:36 fir-md1-s1 kernel: 02 May 01 23:42:36 fir-md1-s1 kernel: 41 May 01 23:42:36 fir-md1-s1 kernel: 8b May 01 23:42:36 fir-md1-s1 kernel: 40 May 01 23:42:36 fir-md1-s1 kernel: 08 May 01 23:42:36 fir-md1-s1 kernel: 85 May 01 23:42:36 fir-md1-s1 kernel: c0 May 01 23:42:36 fir-md1-s1 kernel: 75 May 01 23:42:36 fir-md1-s1 kernel: 0f May 01 23:42:36 fir-md1-s1 kernel: 0f May 01 23:42:36 fir-md1-s1 kernel: 1f May 01 23:42:36 fir-md1-s1 kernel: 44 May 01 23:42:36 fir-md1-s1 kernel: 00 May 01 23:42:36 fir-md1-s1 kernel: 00 May 01 23:42:36 fir-md1-s1 kernel: f3 May 01 23:42:36 fir-md1-s1 kernel: 90 May 01 23:42:36 fir-md1-s1 kernel: 41 May 01 23:42:36 fir-md1-s1 kernel: 8b May 01 23:42:36 fir-md1-s1 kernel: 40 May 01 23:42:36 fir-md1-s1 kernel: 08 May 01 23:42:36 fir-md1-s1 kernel: <85> May 01 23:42:36 fir-md1-s1 kernel: c0 May 01 23:42:36 fir-md1-s1 kernel: 74 May 01 23:42:36 fir-md1-s1 kernel: f6 May 01 23:42:36 fir-md1-s1 kernel: 4d May 01 23:42:36 fir-md1-s1 kernel: 8b May 01 23:42:36 fir-md1-s1 kernel: 08 May 01 23:42:36 fir-md1-s1 kernel: 4d May 01 23:42:36 fir-md1-s1 kernel: 85 May 01 23:42:36 fir-md1-s1 kernel: c9 May 01 23:42:36 fir-md1-s1 kernel: 74 May 01 23:42:36 fir-md1-s1 kernel: 04 May 01 23:42:36 fir-md1-s1 kernel: 41 May 01 23:42:36 fir-md1-s1 kernel: 0f May 01 23:42:36 fir-md1-s1 kernel: 18 May 01 23:42:36 fir-md1-s1 kernel: 09 May 01 23:42:36 fir-md1-s1 kernel: 8b May 01 23:42:36 fir-md1-s1 kernel: 17 May 01 23:42:36 fir-md1-s1 kernel: 0f May 01 23:42:36 fir-md1-s1 kernel: b7 May 01 23:42:36 fir-md1-s1 kernel: c2 May 01 23:42:36 fir-md1-s1 kernel: May 01 23:42:36 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#16 stuck for 23s! [mdt00_018:102388] May 01 23:42:36 fir-md1-s1 kernel: Modules linked in: May 01 23:42:36 fir-md1-s1 kernel: osp(OE) May 01 23:42:36 fir-md1-s1 kernel: mdd(OE) May 01 23:42:36 fir-md1-s1 kernel: lod(OE) May 01 23:42:36 fir-md1-s1 kernel: mdt(OE) May 01 23:42:36 fir-md1-s1 kernel: lfsck(OE) May 01 23:42:36 fir-md1-s1 kernel: mgs(OE) May 01 23:42:36 fir-md1-s1 kernel: mgc(OE) May 01 23:42:36 fir-md1-s1 kernel: osd_ldiskfs(OE) May 01 23:42:36 fir-md1-s1 kernel: lquota(OE) May 01 23:42:36 fir-md1-s1 kernel: ldiskfs(OE) May 01 23:42:36 fir-md1-s1 kernel: lustre(OE) May 01 23:42:36 fir-md1-s1 kernel: lmv(OE) May 01 23:42:36 fir-md1-s1 kernel: mdc(OE) May 01 23:42:36 fir-md1-s1 kernel: osc(OE) May 01 23:42:36 fir-md1-s1 kernel: lov(OE) May 01 23:42:36 fir-md1-s1 kernel: fid(OE) May 01 23:42:36 fir-md1-s1 kernel: fld(OE) May 01 23:42:36 fir-md1-s1 kernel: ko2iblnd(OE) May 01 23:42:36 fir-md1-s1 kernel: ptlrpc(OE) May 01 23:42:36 fir-md1-s1 kernel: obdclass(OE) May 01 23:42:36 fir-md1-s1 kernel: lnet(OE) May 01 23:42:36 fir-md1-s1 kernel: libcfs(OE) May 01 23:42:36 fir-md1-s1 kernel: rpcsec_gss_krb5 May 01 23:42:36 fir-md1-s1 kernel: auth_rpcgss May 01 23:42:36 fir-md1-s1 kernel: nfsv4 May 01 23:42:36 fir-md1-s1 kernel: dns_resolver May 01 23:42:36 fir-md1-s1 kernel: nfs May 01 23:42:36 fir-md1-s1 kernel: lockd May 01 23:42:36 fir-md1-s1 kernel: grace May 01 23:42:36 fir-md1-s1 kernel: fscache May 01 23:42:36 fir-md1-s1 kernel: rdma_ucm(OE) May 01 23:42:36 fir-md1-s1 kernel: ib_ucm(OE) May 01 23:42:36 fir-md1-s1 kernel: rdma_cm(OE) May 01 23:42:36 fir-md1-s1 kernel: iw_cm(OE) May 01 23:42:36 fir-md1-s1 kernel: ib_ipoib(OE) May 01 23:42:36 fir-md1-s1 kernel: ib_cm(OE) May 01 23:42:36 fir-md1-s1 kernel: ib_umad(OE) May 01 23:42:36 fir-md1-s1 kernel: mlx5_fpga_tools(OE) May 01 23:42:36 fir-md1-s1 kernel: mlx4_en(OE) May 01 23:42:36 fir-md1-s1 kernel: mlx4_ib(OE) May 01 23:42:36 fir-md1-s1 kernel: mlx4_core(OE) May 01 23:42:36 fir-md1-s1 kernel: dell_rbu May 01 23:42:36 fir-md1-s1 kernel: sunrpc May 01 23:42:36 fir-md1-s1 kernel: vfat May 01 23:42:36 fir-md1-s1 kernel: fat May 01 23:42:36 fir-md1-s1 kernel: dm_round_robin May 01 23:42:36 fir-md1-s1 kernel: amd64_edac_mod May 01 23:42:36 fir-md1-s1 kernel: edac_mce_amd May 01 23:42:36 fir-md1-s1 kernel: kvm_amd May 01 23:42:36 fir-md1-s1 kernel: kvm May 01 23:42:36 fir-md1-s1 kernel: ses May 01 23:42:36 fir-md1-s1 kernel: irqbypass May 01 23:42:36 fir-md1-s1 kernel: crc32_pclmul May 01 23:42:36 fir-md1-s1 kernel: enclosure May 01 23:42:36 fir-md1-s1 kernel: ghash_clmulni_intel May 01 23:42:36 fir-md1-s1 kernel: dcdbas May 01 23:42:36 fir-md1-s1 kernel: aesni_intel May 01 23:42:36 fir-md1-s1 kernel: lrw May 01 23:42:36 fir-md1-s1 kernel: gf128mul May 01 23:42:36 fir-md1-s1 kernel: glue_helper May 01 23:42:36 fir-md1-s1 kernel: ablk_helper May 01 23:42:36 fir-md1-s1 kernel: cryptd May 01 23:42:36 fir-md1-s1 kernel: ipmi_si May 01 23:42:36 fir-md1-s1 kernel: pcspkr May 01 23:42:36 fir-md1-s1 kernel: ipmi_devintf May 01 23:42:36 fir-md1-s1 kernel: ccp May 01 23:42:36 fir-md1-s1 kernel: i2c_piix4 May 01 23:42:36 fir-md1-s1 kernel: dm_multipath May 01 23:42:36 fir-md1-s1 kernel: sg May 01 23:42:36 fir-md1-s1 kernel: k10temp May 01 23:42:36 fir-md1-s1 kernel: ipmi_msghandler May 01 23:42:36 fir-md1-s1 kernel: dm_mod May 01 23:42:36 fir-md1-s1 kernel: acpi_power_meter May 01 23:42:36 fir-md1-s1 kernel: knem(OE) May 01 23:42:36 fir-md1-s1 kernel: ip_tables May 01 23:42:36 fir-md1-s1 kernel: ext4 May 01 23:42:36 fir-md1-s1 kernel: mbcache May 01 23:42:36 fir-md1-s1 kernel: jbd2 May 01 23:42:36 fir-md1-s1 kernel: sd_mod May 01 23:42:36 fir-md1-s1 kernel: crc_t10dif May 01 23:42:36 fir-md1-s1 kernel: crct10dif_generic May 01 23:42:36 fir-md1-s1 kernel: mlx5_ib(OE) May 01 23:42:36 fir-md1-s1 kernel: ib_uverbs(OE) May 01 23:42:36 fir-md1-s1 kernel: ib_core(OE) May 01 23:42:36 fir-md1-s1 kernel: i2c_algo_bit May 01 23:42:36 fir-md1-s1 kernel: drm_kms_helper May 01 23:42:36 fir-md1-s1 kernel: mlx5_core(OE) May 01 23:42:36 fir-md1-s1 kernel: syscopyarea May 01 23:42:36 fir-md1-s1 kernel: sysfillrect May 01 23:42:36 fir-md1-s1 kernel: sysimgblt May 01 23:42:36 fir-md1-s1 kernel: fb_sys_fops May 01 23:42:36 fir-md1-s1 kernel: mlxfw(OE) May 01 23:42:36 fir-md1-s1 kernel: crct10dif_pclmul May 01 23:42:36 fir-md1-s1 kernel: ttm May 01 23:42:36 fir-md1-s1 kernel: devlink May 01 23:42:36 fir-md1-s1 kernel: ahci May 01 23:42:36 fir-md1-s1 kernel: crct10dif_common May 01 23:42:36 fir-md1-s1 kernel: libahci May 01 23:42:36 fir-md1-s1 kernel: drm May 01 23:42:36 fir-md1-s1 kernel: mlx_compat(OE) May 01 23:42:36 fir-md1-s1 kernel: tg3 May 01 23:42:36 fir-md1-s1 kernel: crc32c_intel May 01 23:42:36 fir-md1-s1 kernel: libata May 01 23:42:36 fir-md1-s1 kernel: megaraid_sas May 01 23:42:36 fir-md1-s1 kernel: drm_panel_orientation_quirks May 01 23:42:36 fir-md1-s1 kernel: ptp May 01 23:42:36 fir-md1-s1 kernel: pps_core May 01 23:42:36 fir-md1-s1 kernel: mpt3sas(OE) May 01 23:42:36 fir-md1-s1 kernel: raid_class May 01 23:42:36 fir-md1-s1 kernel: scsi_transport_sas May 01 23:42:36 fir-md1-s1 kernel: [last unloaded: libcfs] May 01 23:42:36 fir-md1-s1 kernel: May 01 23:42:36 fir-md1-s1 kernel: CPU: 16 PID: 102388 Comm: mdt00_018 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 May 01 23:42:36 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 May 01 23:42:36 fir-md1-s1 kernel: task: ffff985884642080 ti: ffff984c4b64c000 task.ti: ffff984c4b64c000 May 01 23:42:36 fir-md1-s1 kernel: RIP: 0010:[] May 01 23:42:36 fir-md1-s1 kernel: [] ldiskfs_inode_touch_time_cmp+0xd/0x90 [ldiskfs] May 01 23:42:36 fir-md1-s1 kernel: RSP: 0018:ffff984c4b64f180 EFLAGS: 00000282 May 01 23:42:36 fir-md1-s1 kernel: RAX: 8000041400080000 RBX: ffffffffb7019f22 RCX: 000000010c9dde88 May 01 23:42:36 fir-md1-s1 kernel: RDX: ffff984b013d4600 RSI: ffff9836de9d5ac8 RDI: 0000000000000000 May 01 23:42:36 fir-md1-s1 kernel: RBP: ffff984c4b64f1d0 R08: ffff984c4b64f300 R09: 00000000003ecd00 May 01 23:42:36 fir-md1-s1 kernel: R10: 0000000047bdbb01 R11: ffffde3f261ef6c0 R12: 0000000000000000 May 01 23:42:36 fir-md1-s1 kernel: R13: ffff982d3f224c80 R14: ffff983cf8b38400 R15: ffff982cfef254b8 May 01 23:42:36 fir-md1-s1 kernel: FS: 00007f32ccf2c740(0000) GS:ffff982cfef00000(0000) knlGS:0000000000000000 May 01 23:42:36 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 01 23:42:36 fir-md1-s1 kernel: CR2: 00007f32c5fef140 CR3: 00000012f7610000 CR4: 00000000003407e0 May 01 23:42:36 fir-md1-s1 kernel: Call Trace: May 01 23:42:36 fir-md1-s1 kernel: [] ? merge+0x62/0xc0 May 01 23:42:36 fir-md1-s1 kernel: [] ? ldiskfs_init_inode_table+0x410/0x410 [ldiskfs] May 01 23:42:36 fir-md1-s1 kernel: [] list_sort+0x9b/0x250 May 01 23:42:36 fir-md1-s1 kernel: [] __ldiskfs_es_shrink+0x1ce/0x2a0 [ldiskfs] May 01 23:42:36 fir-md1-s1 kernel: [] ldiskfs_es_shrink+0xb4/0x130 [ldiskfs] May 01 23:42:36 fir-md1-s1 kernel: [] shrink_slab+0x175/0x340 May 01 23:42:36 fir-md1-s1 kernel: [] ? zone_watermark_ok+0x1f/0x30 May 01 23:42:36 fir-md1-s1 kernel: [] ? compaction_suitable+0xa3/0xb0 May 01 23:42:36 fir-md1-s1 kernel: [] zone_reclaim+0x1d1/0x2f0 May 01 23:42:36 fir-md1-s1 kernel: [] get_page_from_freelist+0x87b/0xa70 May 01 23:42:36 fir-md1-s1 kernel: [] ? __getblk+0x2d/0x300 May 01 23:42:36 fir-md1-s1 kernel: [] __alloc_pages_nodemask+0x176/0x420 May 01 23:42:36 fir-md1-s1 kernel: [] alloc_pages_current+0x98/0x110 May 01 23:42:36 fir-md1-s1 kernel: [] new_slab+0x2c5/0x390 May 01 23:42:36 fir-md1-s1 kernel: [] ___slab_alloc+0x3ac/0x4f0 May 01 23:42:36 fir-md1-s1 kernel: [] ? osp_object_alloc+0x40/0x170 [osp] May 01 23:42:36 fir-md1-s1 kernel: [] ? fld_cache_lookup+0x36/0x1a0 [fld] May 01 23:42:36 fir-md1-s1 kernel: [] ? fld_local_lookup+0x62/0x270 [fld] May 01 23:42:36 fir-md1-s1 kernel: [] ? osp_object_alloc+0x40/0x170 [osp] May 01 23:42:36 fir-md1-s1 kernel: [] __slab_alloc+0x40/0x5c May 01 23:42:36 fir-md1-s1 kernel: [] kmem_cache_alloc+0x19b/0x1f0 May 01 23:42:36 fir-md1-s1 kernel: [] ? osp_object_alloc+0x40/0x170 [osp] May 01 23:42:36 fir-md1-s1 kernel: [] osp_object_alloc+0x40/0x170 [osp] May 01 23:42:36 fir-md1-s1 kernel: [] lod_object_init+0x1e7/0x3c0 [lod] May 01 23:42:36 fir-md1-s1 kernel: [] lu_object_alloc+0xe5/0x320 [obdclass] May 01 23:42:36 fir-md1-s1 kernel: [] lu_object_find_at+0x76/0x280 [obdclass] May 01 23:42:36 fir-md1-s1 kernel: [] lu_object_find_slice+0x1f/0x90 [obdclass] May 01 23:42:36 fir-md1-s1 kernel: [] mdd_object_find+0x10/0x70 [mdd] May 01 23:42:36 fir-md1-s1 kernel: [] obf_lookup+0x2c9/0x350 [mdd] May 01 23:42:36 fir-md1-s1 kernel: [] ? req_capsule_get_size+0x31/0x70 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0xf7c/0x1c30 [mdt] May 01 23:42:36 fir-md1-s1 kernel: [] ? lustre_msg_buf+0x17/0x60 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? __req_capsule_get+0x15f/0x740 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? lustre_msg_get_flags+0x2c/0xa0 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] May 01 23:42:36 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] May 01 23:42:36 fir-md1-s1 kernel: [] ? mdt_intent_layout+0xcc0/0xcc0 [mdt] May 01 23:42:36 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? cfs_hash_bd_add_locked+0x63/0x80 [libcfs] May 01 23:42:36 fir-md1-s1 kernel: [] ? cfs_hash_add+0xbe/0x1a0 [libcfs] May 01 23:42:36 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? lustre_swab_ldlm_lock_desc+0x30/0x30 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] May 01 23:42:36 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? default_wake_function+0x12/0x20 May 01 23:42:36 fir-md1-s1 kernel: [] ? __wake_up_common+0x5b/0x90 May 01 23:42:36 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 May 01 23:42:36 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:42:36 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 May 01 23:42:36 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:42:36 fir-md1-s1 kernel: Code: May 01 23:42:36 fir-md1-s1 kernel: ff May 01 23:42:36 fir-md1-s1 kernel: 8d May 01 23:42:36 fir-md1-s1 kernel: 4a May 01 23:42:36 fir-md1-s1 kernel: 01 May 01 23:42:36 fir-md1-s1 kernel: 89 May 01 23:42:36 fir-md1-s1 kernel: d0 May 01 23:42:36 fir-md1-s1 kernel: f0 May 01 23:42:36 fir-md1-s1 kernel: 0f May 01 23:42:36 fir-md1-s1 kernel: b1 May 01 23:42:36 fir-md1-s1 kernel: 0f May 01 23:42:36 fir-md1-s1 kernel: 39 May 01 23:42:36 fir-md1-s1 kernel: d0 May 01 23:42:36 fir-md1-s1 kernel: 0f May 01 23:42:36 fir-md1-s1 kernel: 84 May 01 23:42:36 fir-md1-s1 kernel: fb May 01 23:42:36 fir-md1-s1 kernel: fd May 01 23:42:36 fir-md1-s1 kernel: ff May 01 23:42:36 fir-md1-s1 kernel: ff May 01 23:42:36 fir-md1-s1 kernel: 89 May 01 23:42:36 fir-md1-s1 kernel: c2 May 01 23:42:36 fir-md1-s1 kernel: eb May 01 23:42:36 fir-md1-s1 kernel: e2 May 01 23:42:36 fir-md1-s1 kernel: 0f May 01 23:42:36 fir-md1-s1 kernel: 1f May 01 23:42:36 fir-md1-s1 kernel: 84 May 01 23:42:36 fir-md1-s1 kernel: 00 May 01 23:42:36 fir-md1-s1 kernel: 00 May 01 23:42:36 fir-md1-s1 kernel: 00 May 01 23:42:36 fir-md1-s1 kernel: 00 May 01 23:42:36 fir-md1-s1 kernel: 00 May 01 23:42:36 fir-md1-s1 kernel: 66 May 01 23:42:36 fir-md1-s1 kernel: 66 May 01 23:42:36 fir-md1-s1 kernel: 66 May 01 23:42:36 fir-md1-s1 kernel: 66 May 01 23:42:36 fir-md1-s1 kernel: 90 May 01 23:42:36 fir-md1-s1 kernel: 55 May 01 23:42:36 fir-md1-s1 kernel: 48 May 01 23:42:36 fir-md1-s1 kernel: 8b May 01 23:42:36 fir-md1-s1 kernel: 86 May 01 23:42:36 fir-md1-s1 kernel: e8 May 01 23:42:36 fir-md1-s1 kernel: fc May 01 23:42:36 fir-md1-s1 kernel: ff May 01 23:42:36 fir-md1-s1 kernel: ff May 01 23:42:36 fir-md1-s1 kernel: <48> May 01 23:42:36 fir-md1-s1 kernel: 89 May 01 23:42:36 fir-md1-s1 kernel: e5 May 01 23:42:36 fir-md1-s1 kernel: 48 May 01 23:42:36 fir-md1-s1 kernel: c1 May 01 23:42:36 fir-md1-s1 kernel: e8 May 01 23:42:36 fir-md1-s1 kernel: 2b May 01 23:42:36 fir-md1-s1 kernel: a8 May 01 23:42:36 fir-md1-s1 kernel: 01 May 01 23:42:36 fir-md1-s1 kernel: 74 May 01 23:42:36 fir-md1-s1 kernel: 15 May 01 23:42:36 fir-md1-s1 kernel: 48 May 01 23:42:36 fir-md1-s1 kernel: 8b May 01 23:42:36 fir-md1-s1 kernel: 8a May 01 23:42:36 fir-md1-s1 kernel: e8 May 01 23:42:36 fir-md1-s1 kernel: fc May 01 23:42:36 fir-md1-s1 kernel: ff May 01 23:42:36 fir-md1-s1 kernel: ff May 01 23:42:36 fir-md1-s1 kernel: b8 May 01 23:42:36 fir-md1-s1 kernel: 01 May 01 23:42:36 fir-md1-s1 kernel: 00 May 01 23:42:36 fir-md1-s1 kernel: May 01 23:42:36 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#18 stuck for 23s! [mdt_io02_065:103134] May 01 23:42:36 fir-md1-s1 kernel: Modules linked in: May 01 23:42:36 fir-md1-s1 kernel: osp(OE) May 01 23:42:36 fir-md1-s1 kernel: mdd(OE) May 01 23:42:36 fir-md1-s1 kernel: lod(OE) May 01 23:42:36 fir-md1-s1 kernel: mdt(OE) May 01 23:42:36 fir-md1-s1 kernel: lfsck(OE) May 01 23:42:36 fir-md1-s1 kernel: mgs(OE) May 01 23:42:36 fir-md1-s1 kernel: mgc(OE) May 01 23:42:36 fir-md1-s1 kernel: osd_ldiskfs(OE) May 01 23:42:36 fir-md1-s1 kernel: lquota(OE) May 01 23:42:36 fir-md1-s1 kernel: ldiskfs(OE) May 01 23:42:36 fir-md1-s1 kernel: lustre(OE) May 01 23:42:36 fir-md1-s1 kernel: lmv(OE) May 01 23:42:36 fir-md1-s1 kernel: mdc(OE) May 01 23:42:36 fir-md1-s1 kernel: osc(OE) May 01 23:42:36 fir-md1-s1 kernel: lov(OE) May 01 23:42:36 fir-md1-s1 kernel: fid(OE) May 01 23:42:36 fir-md1-s1 kernel: fld(OE) May 01 23:42:36 fir-md1-s1 kernel: ko2iblnd(OE) May 01 23:42:36 fir-md1-s1 kernel: ptlrpc(OE) May 01 23:42:36 fir-md1-s1 kernel: obdclass(OE) May 01 23:42:36 fir-md1-s1 kernel: lnet(OE) May 01 23:42:36 fir-md1-s1 kernel: libcfs(OE) May 01 23:42:36 fir-md1-s1 kernel: rpcsec_gss_krb5 May 01 23:42:36 fir-md1-s1 kernel: auth_rpcgss May 01 23:42:36 fir-md1-s1 kernel: nfsv4 May 01 23:42:36 fir-md1-s1 kernel: dns_resolver May 01 23:42:36 fir-md1-s1 kernel: nfs May 01 23:42:36 fir-md1-s1 kernel: lockd May 01 23:42:36 fir-md1-s1 kernel: grace May 01 23:42:36 fir-md1-s1 kernel: fscache May 01 23:42:36 fir-md1-s1 kernel: rdma_ucm(OE) May 01 23:42:36 fir-md1-s1 kernel: ib_ucm(OE) May 01 23:42:36 fir-md1-s1 kernel: rdma_cm(OE) May 01 23:42:36 fir-md1-s1 kernel: iw_cm(OE) May 01 23:42:36 fir-md1-s1 kernel: ib_ipoib(OE) May 01 23:42:36 fir-md1-s1 kernel: ib_cm(OE) May 01 23:42:36 fir-md1-s1 kernel: ib_umad(OE) May 01 23:42:36 fir-md1-s1 kernel: mlx5_fpga_tools(OE) May 01 23:42:36 fir-md1-s1 kernel: mlx4_en(OE) May 01 23:42:36 fir-md1-s1 kernel: mlx4_ib(OE) May 01 23:42:36 fir-md1-s1 kernel: mlx4_core(OE) May 01 23:42:36 fir-md1-s1 kernel: dell_rbu May 01 23:42:36 fir-md1-s1 kernel: sunrpc May 01 23:42:36 fir-md1-s1 kernel: vfat May 01 23:42:36 fir-md1-s1 kernel: fat May 01 23:42:36 fir-md1-s1 kernel: dm_round_robin May 01 23:42:36 fir-md1-s1 kernel: amd64_edac_mod May 01 23:42:36 fir-md1-s1 kernel: edac_mce_amd May 01 23:42:36 fir-md1-s1 kernel: kvm_amd May 01 23:42:36 fir-md1-s1 kernel: kvm May 01 23:42:36 fir-md1-s1 kernel: ses May 01 23:42:36 fir-md1-s1 kernel: irqbypass May 01 23:42:36 fir-md1-s1 kernel: crc32_pclmul May 01 23:42:36 fir-md1-s1 kernel: enclosure May 01 23:42:36 fir-md1-s1 kernel: ghash_clmulni_intel May 01 23:42:36 fir-md1-s1 kernel: dcdbas May 01 23:42:36 fir-md1-s1 kernel: aesni_intel May 01 23:42:36 fir-md1-s1 kernel: lrw May 01 23:42:36 fir-md1-s1 kernel: gf128mul May 01 23:42:36 fir-md1-s1 kernel: glue_helper May 01 23:42:36 fir-md1-s1 kernel: ablk_helper May 01 23:42:36 fir-md1-s1 kernel: cryptd May 01 23:42:36 fir-md1-s1 kernel: ipmi_si May 01 23:42:36 fir-md1-s1 kernel: pcspkr May 01 23:42:36 fir-md1-s1 kernel: ipmi_devintf May 01 23:42:36 fir-md1-s1 kernel: ccp May 01 23:42:36 fir-md1-s1 kernel: i2c_piix4 May 01 23:42:36 fir-md1-s1 kernel: dm_multipath May 01 23:42:36 fir-md1-s1 kernel: sg May 01 23:42:36 fir-md1-s1 kernel: k10temp May 01 23:42:36 fir-md1-s1 kernel: ipmi_msghandler May 01 23:42:36 fir-md1-s1 kernel: dm_mod May 01 23:42:36 fir-md1-s1 kernel: acpi_power_meter May 01 23:42:36 fir-md1-s1 kernel: knem(OE) May 01 23:42:36 fir-md1-s1 kernel: ip_tables May 01 23:42:36 fir-md1-s1 kernel: ext4 May 01 23:42:36 fir-md1-s1 kernel: mbcache May 01 23:42:36 fir-md1-s1 kernel: jbd2 May 01 23:42:36 fir-md1-s1 kernel: sd_mod May 01 23:42:36 fir-md1-s1 kernel: crc_t10dif May 01 23:42:36 fir-md1-s1 kernel: crct10dif_generic May 01 23:42:36 fir-md1-s1 kernel: mlx5_ib(OE) May 01 23:42:36 fir-md1-s1 kernel: ib_uverbs(OE) May 01 23:42:36 fir-md1-s1 kernel: ib_core(OE) May 01 23:42:36 fir-md1-s1 kernel: i2c_algo_bit May 01 23:42:36 fir-md1-s1 kernel: drm_kms_helper May 01 23:42:36 fir-md1-s1 kernel: mlx5_core(OE) May 01 23:42:36 fir-md1-s1 kernel: syscopyarea May 01 23:42:36 fir-md1-s1 kernel: sysfillrect May 01 23:42:36 fir-md1-s1 kernel: sysimgblt May 01 23:42:36 fir-md1-s1 kernel: fb_sys_fops May 01 23:42:36 fir-md1-s1 kernel: mlxfw(OE) May 01 23:42:36 fir-md1-s1 kernel: crct10dif_pclmul May 01 23:42:36 fir-md1-s1 kernel: ttm May 01 23:42:36 fir-md1-s1 kernel: devlink May 01 23:42:36 fir-md1-s1 kernel: ahci May 01 23:42:36 fir-md1-s1 kernel: crct10dif_common May 01 23:42:36 fir-md1-s1 kernel: libahci May 01 23:42:36 fir-md1-s1 kernel: drm May 01 23:42:36 fir-md1-s1 kernel: mlx_compat(OE) May 01 23:42:36 fir-md1-s1 kernel: tg3 May 01 23:42:36 fir-md1-s1 kernel: crc32c_intel May 01 23:42:36 fir-md1-s1 kernel: libata May 01 23:42:36 fir-md1-s1 kernel: megaraid_sas May 01 23:42:36 fir-md1-s1 kernel: drm_panel_orientation_quirks May 01 23:42:36 fir-md1-s1 kernel: ptp May 01 23:42:36 fir-md1-s1 kernel: pps_core May 01 23:42:36 fir-md1-s1 kernel: mpt3sas(OE) May 01 23:42:36 fir-md1-s1 kernel: raid_class May 01 23:42:36 fir-md1-s1 kernel: scsi_transport_sas May 01 23:42:36 fir-md1-s1 kernel: [last unloaded: libcfs] May 01 23:42:36 fir-md1-s1 kernel: May 01 23:42:36 fir-md1-s1 kernel: CPU: 18 PID: 103134 Comm: mdt_io02_065 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 May 01 23:42:36 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 May 01 23:42:36 fir-md1-s1 kernel: task: ffff985ccda90000 ti: ffff98583efd4000 task.ti: ffff98583efd4000 May 01 23:42:36 fir-md1-s1 kernel: RIP: 0010:[] May 01 23:42:36 fir-md1-s1 kernel: [] native_queued_spin_lock_slowpath+0x122/0x200 May 01 23:42:36 fir-md1-s1 kernel: RSP: 0018:ffff98583efd7750 EFLAGS: 00000246 May 01 23:42:36 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffff983164d60378 RCX: 0000000000910000 May 01 23:42:36 fir-md1-s1 kernel: RDX: ffff983cff69b780 RSI: 0000000000490101 RDI: ffff982c9fc8c480 May 01 23:42:36 fir-md1-s1 kernel: RBP: ffff98583efd7750 R08: ffff984cff71b780 R09: 0000000000000000 May 01 23:42:36 fir-md1-s1 kernel: R10: ffff984cff71f140 R11: ffffde3ef98bd000 R12: 0000000000000000 May 01 23:42:36 fir-md1-s1 kernel: R13: ffff98583efd76f0 R14: ffff983164d600e8 R15: 0000000000000000 May 01 23:42:36 fir-md1-s1 kernel: FS: 00007f010bbcf880(0000) GS:ffff984cff700000(0000) knlGS:0000000000000000 May 01 23:42:36 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 01 23:42:36 fir-md1-s1 kernel: CR2: 0000000001c9e8e0 CR3: 000000402db9c000 CR4: 00000000003407e0 May 01 23:42:36 fir-md1-s1 kernel: Call Trace: May 01 23:42:36 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf May 01 23:42:36 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 May 01 23:42:36 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] May 01 23:42:36 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] May 01 23:42:36 fir-md1-s1 kernel: [] ? zone_statistics+0x88/0xa0 May 01 23:42:36 fir-md1-s1 kernel: [] ? qsd_op_begin+0xb1/0x4b0 [lquota] May 01 23:42:36 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] May 01 23:42:36 fir-md1-s1 kernel: [] ? ldiskfs_inode_attach_jinode+0x55/0xd0 [ldiskfs] May 01 23:42:36 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] May 01 23:42:36 fir-md1-s1 kernel: [] osd_write_commit+0x3a2/0x8c0 [osd_ldiskfs] May 01 23:42:36 fir-md1-s1 kernel: [] ? __ldiskfs_journal_start_sb+0x69/0xe0 [ldiskfs] May 01 23:42:36 fir-md1-s1 kernel: [] mdt_commitrw_write.isra.46+0x608/0xd20 [mdt] May 01 23:42:36 fir-md1-s1 kernel: [] mdt_obd_commitrw+0x29b/0x520 [mdt] May 01 23:42:36 fir-md1-s1 kernel: [] obd_commitrw+0x9c/0x370 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] tgt_brw_write+0x100d/0x1a90 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? lustre_msg_buf_v2+0x1b0/0x1b0 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? lustre_msg_buf+0x17/0x60 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? update_curr+0x14c/0x1e0 May 01 23:42:36 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] May 01 23:42:36 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? default_wake_function+0x12/0x20 May 01 23:42:36 fir-md1-s1 kernel: [] ? __wake_up_common+0x5b/0x90 May 01 23:42:36 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 May 01 23:42:36 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:42:36 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 May 01 23:42:36 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:42:36 fir-md1-s1 kernel: Code: May 01 23:42:36 fir-md1-s1 kernel: 13 May 01 23:42:36 fir-md1-s1 kernel: 48 May 01 23:42:36 fir-md1-s1 kernel: c1 May 01 23:42:36 fir-md1-s1 kernel: ea May 01 23:42:36 fir-md1-s1 kernel: 0d May 01 23:42:36 fir-md1-s1 kernel: 48 May 01 23:42:36 fir-md1-s1 kernel: 98 May 01 23:42:36 fir-md1-s1 kernel: 83 May 01 23:42:36 fir-md1-s1 kernel: e2 May 01 23:42:36 fir-md1-s1 kernel: 30 May 01 23:42:36 fir-md1-s1 kernel: 48 May 01 23:42:36 fir-md1-s1 kernel: 81 May 01 23:42:36 fir-md1-s1 kernel: c2 May 01 23:42:36 fir-md1-s1 kernel: 80 May 01 23:42:36 fir-md1-s1 kernel: b7 May 01 23:42:36 fir-md1-s1 kernel: 01 May 01 23:42:36 fir-md1-s1 kernel: 00 May 01 23:42:36 fir-md1-s1 kernel: 48 May 01 23:42:36 fir-md1-s1 kernel: 03 May 01 23:42:36 fir-md1-s1 kernel: 14 May 01 23:42:36 fir-md1-s1 kernel: c5 May 01 23:42:36 fir-md1-s1 kernel: 60 May 01 23:42:36 fir-md1-s1 kernel: b9 May 01 23:42:36 fir-md1-s1 kernel: b4 May 01 23:42:36 fir-md1-s1 kernel: b7 May 01 23:42:36 fir-md1-s1 kernel: 4c May 01 23:42:36 fir-md1-s1 kernel: 89 May 01 23:42:36 fir-md1-s1 kernel: 02 May 01 23:42:36 fir-md1-s1 kernel: 41 May 01 23:42:36 fir-md1-s1 kernel: 8b May 01 23:42:36 fir-md1-s1 kernel: 40 May 01 23:42:36 fir-md1-s1 kernel: 08 May 01 23:42:36 fir-md1-s1 kernel: 85 May 01 23:42:36 fir-md1-s1 kernel: c0 May 01 23:42:36 fir-md1-s1 kernel: 75 May 01 23:42:36 fir-md1-s1 kernel: 0f May 01 23:42:36 fir-md1-s1 kernel: 0f May 01 23:42:36 fir-md1-s1 kernel: 1f May 01 23:42:36 fir-md1-s1 kernel: 44 May 01 23:42:36 fir-md1-s1 kernel: 00 May 01 23:42:36 fir-md1-s1 kernel: 00 May 01 23:42:36 fir-md1-s1 kernel: f3 May 01 23:42:36 fir-md1-s1 kernel: 90 May 01 23:42:36 fir-md1-s1 kernel: <41> May 01 23:42:36 fir-md1-s1 kernel: 8b May 01 23:42:36 fir-md1-s1 kernel: 40 May 01 23:42:36 fir-md1-s1 kernel: 08 May 01 23:42:36 fir-md1-s1 kernel: 85 May 01 23:42:36 fir-md1-s1 kernel: c0 May 01 23:42:36 fir-md1-s1 kernel: 74 May 01 23:42:36 fir-md1-s1 kernel: f6 May 01 23:42:36 fir-md1-s1 kernel: 4d May 01 23:42:36 fir-md1-s1 kernel: 8b May 01 23:42:36 fir-md1-s1 kernel: 08 May 01 23:42:36 fir-md1-s1 kernel: 4d May 01 23:42:36 fir-md1-s1 kernel: 85 May 01 23:42:36 fir-md1-s1 kernel: c9 May 01 23:42:36 fir-md1-s1 kernel: 74 May 01 23:42:36 fir-md1-s1 kernel: 04 May 01 23:42:36 fir-md1-s1 kernel: 41 May 01 23:42:36 fir-md1-s1 kernel: 0f May 01 23:42:36 fir-md1-s1 kernel: 18 May 01 23:42:36 fir-md1-s1 kernel: 09 May 01 23:42:36 fir-md1-s1 kernel: 8b May 01 23:42:36 fir-md1-s1 kernel: May 01 23:42:36 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#25 stuck for 22s! [mdt_io01_082:103083] May 01 23:42:36 fir-md1-s1 kernel: Modules linked in: May 01 23:42:36 fir-md1-s1 kernel: osp(OE) May 01 23:42:36 fir-md1-s1 kernel: mdd(OE) May 01 23:42:36 fir-md1-s1 kernel: lod(OE) May 01 23:42:36 fir-md1-s1 kernel: mdt(OE) May 01 23:42:36 fir-md1-s1 kernel: lfsck(OE) May 01 23:42:36 fir-md1-s1 kernel: mgs(OE) May 01 23:42:36 fir-md1-s1 kernel: mgc(OE) May 01 23:42:36 fir-md1-s1 kernel: osd_ldiskfs(OE) May 01 23:42:36 fir-md1-s1 kernel: lquota(OE) May 01 23:42:36 fir-md1-s1 kernel: ldiskfs(OE) May 01 23:42:36 fir-md1-s1 kernel: lustre(OE) May 01 23:42:36 fir-md1-s1 kernel: lmv(OE) May 01 23:42:36 fir-md1-s1 kernel: mdc(OE) May 01 23:42:36 fir-md1-s1 kernel: osc(OE) May 01 23:42:36 fir-md1-s1 kernel: lov(OE) May 01 23:42:36 fir-md1-s1 kernel: fid(OE) May 01 23:42:36 fir-md1-s1 kernel: fld(OE) May 01 23:42:36 fir-md1-s1 kernel: ko2iblnd(OE) May 01 23:42:36 fir-md1-s1 kernel: ptlrpc(OE) May 01 23:42:36 fir-md1-s1 kernel: obdclass(OE) May 01 23:42:36 fir-md1-s1 kernel: lnet(OE) May 01 23:42:36 fir-md1-s1 kernel: libcfs(OE) May 01 23:42:36 fir-md1-s1 kernel: rpcsec_gss_krb5 May 01 23:42:36 fir-md1-s1 kernel: auth_rpcgss May 01 23:42:36 fir-md1-s1 kernel: nfsv4 May 01 23:42:36 fir-md1-s1 kernel: dns_resolver May 01 23:42:36 fir-md1-s1 kernel: nfs May 01 23:42:36 fir-md1-s1 kernel: lockd May 01 23:42:36 fir-md1-s1 kernel: grace May 01 23:42:36 fir-md1-s1 kernel: fscache May 01 23:42:36 fir-md1-s1 kernel: rdma_ucm(OE) May 01 23:42:36 fir-md1-s1 kernel: ib_ucm(OE) May 01 23:42:36 fir-md1-s1 kernel: rdma_cm(OE) May 01 23:42:36 fir-md1-s1 kernel: iw_cm(OE) May 01 23:42:36 fir-md1-s1 kernel: ib_ipoib(OE) May 01 23:42:36 fir-md1-s1 kernel: ib_cm(OE) May 01 23:42:36 fir-md1-s1 kernel: ib_umad(OE) May 01 23:42:36 fir-md1-s1 kernel: mlx5_fpga_tools(OE) May 01 23:42:36 fir-md1-s1 kernel: mlx4_en(OE) May 01 23:42:36 fir-md1-s1 kernel: mlx4_ib(OE) May 01 23:42:36 fir-md1-s1 kernel: mlx4_core(OE) May 01 23:42:36 fir-md1-s1 kernel: dell_rbu May 01 23:42:36 fir-md1-s1 kernel: sunrpc May 01 23:42:36 fir-md1-s1 kernel: vfat May 01 23:42:36 fir-md1-s1 kernel: fat May 01 23:42:36 fir-md1-s1 kernel: dm_round_robin May 01 23:42:36 fir-md1-s1 kernel: amd64_edac_mod May 01 23:42:36 fir-md1-s1 kernel: edac_mce_amd May 01 23:42:36 fir-md1-s1 kernel: kvm_amd May 01 23:42:36 fir-md1-s1 kernel: kvm May 01 23:42:36 fir-md1-s1 kernel: ses May 01 23:42:36 fir-md1-s1 kernel: irqbypass May 01 23:42:36 fir-md1-s1 kernel: crc32_pclmul May 01 23:42:36 fir-md1-s1 kernel: enclosure May 01 23:42:36 fir-md1-s1 kernel: ghash_clmulni_intel May 01 23:42:36 fir-md1-s1 kernel: dcdbas May 01 23:42:36 fir-md1-s1 kernel: aesni_intel May 01 23:42:36 fir-md1-s1 kernel: lrw May 01 23:42:36 fir-md1-s1 kernel: gf128mul May 01 23:42:36 fir-md1-s1 kernel: glue_helper May 01 23:42:36 fir-md1-s1 kernel: ablk_helper May 01 23:42:36 fir-md1-s1 kernel: cryptd May 01 23:42:36 fir-md1-s1 kernel: ipmi_si May 01 23:42:36 fir-md1-s1 kernel: pcspkr May 01 23:42:36 fir-md1-s1 kernel: ipmi_devintf May 01 23:42:36 fir-md1-s1 kernel: ccp May 01 23:42:36 fir-md1-s1 kernel: i2c_piix4 May 01 23:42:36 fir-md1-s1 kernel: dm_multipath May 01 23:42:36 fir-md1-s1 kernel: sg May 01 23:42:36 fir-md1-s1 kernel: k10temp May 01 23:42:36 fir-md1-s1 kernel: ipmi_msghandler May 01 23:42:36 fir-md1-s1 kernel: dm_mod May 01 23:42:36 fir-md1-s1 kernel: acpi_power_meter May 01 23:42:36 fir-md1-s1 kernel: knem(OE) May 01 23:42:36 fir-md1-s1 kernel: ip_tables May 01 23:42:36 fir-md1-s1 kernel: ext4 May 01 23:42:36 fir-md1-s1 kernel: mbcache May 01 23:42:36 fir-md1-s1 kernel: jbd2 May 01 23:42:36 fir-md1-s1 kernel: sd_mod May 01 23:42:36 fir-md1-s1 kernel: crc_t10dif May 01 23:42:36 fir-md1-s1 kernel: crct10dif_generic May 01 23:42:36 fir-md1-s1 kernel: mlx5_ib(OE) May 01 23:42:36 fir-md1-s1 kernel: ib_uverbs(OE) May 01 23:42:36 fir-md1-s1 kernel: ib_core(OE) May 01 23:42:36 fir-md1-s1 kernel: i2c_algo_bit May 01 23:42:36 fir-md1-s1 kernel: drm_kms_helper May 01 23:42:36 fir-md1-s1 kernel: mlx5_core(OE) May 01 23:42:36 fir-md1-s1 kernel: syscopyarea May 01 23:42:36 fir-md1-s1 kernel: sysfillrect May 01 23:42:36 fir-md1-s1 kernel: sysimgblt May 01 23:42:36 fir-md1-s1 kernel: fb_sys_fops May 01 23:42:36 fir-md1-s1 kernel: mlxfw(OE) May 01 23:42:36 fir-md1-s1 kernel: crct10dif_pclmul May 01 23:42:36 fir-md1-s1 kernel: ttm May 01 23:42:36 fir-md1-s1 kernel: devlink May 01 23:42:36 fir-md1-s1 kernel: ahci May 01 23:42:36 fir-md1-s1 kernel: crct10dif_common May 01 23:42:36 fir-md1-s1 kernel: libahci May 01 23:42:36 fir-md1-s1 kernel: drm May 01 23:42:36 fir-md1-s1 kernel: mlx_compat(OE) May 01 23:42:36 fir-md1-s1 kernel: tg3 May 01 23:42:36 fir-md1-s1 kernel: crc32c_intel May 01 23:42:36 fir-md1-s1 kernel: libata May 01 23:42:36 fir-md1-s1 kernel: megaraid_sas May 01 23:42:36 fir-md1-s1 kernel: drm_panel_orientation_quirks May 01 23:42:36 fir-md1-s1 kernel: ptp May 01 23:42:36 fir-md1-s1 kernel: pps_core May 01 23:42:36 fir-md1-s1 kernel: mpt3sas(OE) May 01 23:42:36 fir-md1-s1 kernel: raid_class May 01 23:42:36 fir-md1-s1 kernel: scsi_transport_sas May 01 23:42:36 fir-md1-s1 kernel: [last unloaded: libcfs] May 01 23:42:36 fir-md1-s1 kernel: May 01 23:42:36 fir-md1-s1 kernel: CPU: 25 PID: 103083 Comm: mdt_io01_082 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 May 01 23:42:36 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 May 01 23:42:36 fir-md1-s1 kernel: task: ffff985cfe905140 ti: ffff985ccaf18000 task.ti: ffff985ccaf18000 May 01 23:42:36 fir-md1-s1 kernel: RIP: 0010:[] May 01 23:42:36 fir-md1-s1 kernel: [] native_queued_spin_lock_slowpath+0x128/0x200 May 01 23:42:36 fir-md1-s1 kernel: RSP: 0018:ffff985ccaf1b800 EFLAGS: 00000246 May 01 23:42:36 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffff9831703ceb60 RCX: 0000000000c90000 May 01 23:42:36 fir-md1-s1 kernel: RDX: ffff984cff71b780 RSI: 0000000000910101 RDI: ffff982c9fc8c480 May 01 23:42:36 fir-md1-s1 kernel: RBP: ffff985ccaf1b800 R08: ffff983cff79b780 R09: 0000000000000000 May 01 23:42:36 fir-md1-s1 kernel: R10: ffff983cff79f140 R11: ffffde3fa5607800 R12: 0000000000000000 May 01 23:42:36 fir-md1-s1 kernel: R13: ffff985ccaf1b7a0 R14: ffff9831703ce8d0 R15: 0000000000000000 May 01 23:42:36 fir-md1-s1 kernel: FS: 00007f427f792740(0000) GS:ffff983cff780000(0000) knlGS:0000000000000000 May 01 23:42:36 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 01 23:42:36 fir-md1-s1 kernel: CR2: 00007f427f58b000 CR3: 00000012f7610000 CR4: 00000000003407e0 May 01 23:42:36 fir-md1-s1 kernel: Call Trace: May 01 23:42:36 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf May 01 23:42:36 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 May 01 23:42:36 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] May 01 23:42:36 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] May 01 23:42:36 fir-md1-s1 kernel: [] ? ktime_get+0x52/0xe0 May 01 23:42:36 fir-md1-s1 kernel: [] ? kiblnd_check_sends_locked+0xa72/0xe40 [ko2iblnd] May 01 23:42:36 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] May 01 23:42:36 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 May 01 23:42:36 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] May 01 23:42:36 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] May 01 23:42:36 fir-md1-s1 kernel: [] mdt_obd_preprw+0x637/0x1060 [mdt] May 01 23:42:36 fir-md1-s1 kernel: [] tgt_brw_write+0xc7e/0x1a90 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? lustre_msg_buf_v2+0x1b0/0x1b0 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? lustre_msg_buf+0x17/0x60 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? update_curr+0x14c/0x1e0 May 01 23:42:36 fir-md1-s1 kernel: [] ? account_entity_dequeue+0xae/0xd0 May 01 23:42:36 fir-md1-s1 kernel: [] ? __enqueue_entity+0x78/0x80 May 01 23:42:36 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] May 01 23:42:36 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? default_wake_function+0x12/0x20 May 01 23:42:36 fir-md1-s1 kernel: [] ? __wake_up_common+0x5b/0x90 May 01 23:42:36 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 May 01 23:42:36 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:42:36 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 May 01 23:42:36 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:42:36 fir-md1-s1 kernel: Code: May 01 23:42:36 fir-md1-s1 kernel: 98 May 01 23:42:36 fir-md1-s1 kernel: 83 May 01 23:42:36 fir-md1-s1 kernel: e2 May 01 23:42:36 fir-md1-s1 kernel: 30 May 01 23:42:36 fir-md1-s1 kernel: 48 May 01 23:42:36 fir-md1-s1 kernel: 81 May 01 23:42:36 fir-md1-s1 kernel: c2 May 01 23:42:36 fir-md1-s1 kernel: 80 May 01 23:42:36 fir-md1-s1 kernel: b7 May 01 23:42:36 fir-md1-s1 kernel: 01 May 01 23:42:36 fir-md1-s1 kernel: 00 May 01 23:42:36 fir-md1-s1 kernel: 48 May 01 23:42:36 fir-md1-s1 kernel: 03 May 01 23:42:36 fir-md1-s1 kernel: 14 May 01 23:42:36 fir-md1-s1 kernel: c5 May 01 23:42:36 fir-md1-s1 kernel: 60 May 01 23:42:36 fir-md1-s1 kernel: b9 May 01 23:42:36 fir-md1-s1 kernel: b4 May 01 23:42:36 fir-md1-s1 kernel: b7 May 01 23:42:36 fir-md1-s1 kernel: 4c May 01 23:42:36 fir-md1-s1 kernel: 89 May 01 23:42:36 fir-md1-s1 kernel: 02 May 01 23:42:36 fir-md1-s1 kernel: 41 May 01 23:42:36 fir-md1-s1 kernel: 8b May 01 23:42:36 fir-md1-s1 kernel: 40 May 01 23:42:36 fir-md1-s1 kernel: 08 May 01 23:42:36 fir-md1-s1 kernel: 85 May 01 23:42:36 fir-md1-s1 kernel: c0 May 01 23:42:36 fir-md1-s1 kernel: 75 May 01 23:42:36 fir-md1-s1 kernel: 0f May 01 23:42:36 fir-md1-s1 kernel: 0f May 01 23:42:36 fir-md1-s1 kernel: 1f May 01 23:42:36 fir-md1-s1 kernel: 44 May 01 23:42:36 fir-md1-s1 kernel: 00 May 01 23:42:36 fir-md1-s1 kernel: 00 May 01 23:42:36 fir-md1-s1 kernel: f3 May 01 23:42:36 fir-md1-s1 kernel: 90 May 01 23:42:36 fir-md1-s1 kernel: 41 May 01 23:42:36 fir-md1-s1 kernel: 8b May 01 23:42:36 fir-md1-s1 kernel: 40 May 01 23:42:36 fir-md1-s1 kernel: 08 May 01 23:42:36 fir-md1-s1 kernel: 85 May 01 23:42:36 fir-md1-s1 kernel: c0 May 01 23:42:36 fir-md1-s1 kernel: <74> May 01 23:42:36 fir-md1-s1 kernel: f6 May 01 23:42:36 fir-md1-s1 kernel: 4d May 01 23:42:36 fir-md1-s1 kernel: 8b May 01 23:42:36 fir-md1-s1 kernel: 08 May 01 23:42:36 fir-md1-s1 kernel: 4d May 01 23:42:36 fir-md1-s1 kernel: 85 May 01 23:42:36 fir-md1-s1 kernel: c9 May 01 23:42:36 fir-md1-s1 kernel: 74 May 01 23:42:36 fir-md1-s1 kernel: 04 May 01 23:42:36 fir-md1-s1 kernel: 41 May 01 23:42:36 fir-md1-s1 kernel: 0f May 01 23:42:36 fir-md1-s1 kernel: 18 May 01 23:42:36 fir-md1-s1 kernel: 09 May 01 23:42:36 fir-md1-s1 kernel: 8b May 01 23:42:36 fir-md1-s1 kernel: 17 May 01 23:42:36 fir-md1-s1 kernel: 0f May 01 23:42:36 fir-md1-s1 kernel: b7 May 01 23:42:36 fir-md1-s1 kernel: c2 May 01 23:42:36 fir-md1-s1 kernel: 85 May 01 23:42:36 fir-md1-s1 kernel: c0 May 01 23:42:36 fir-md1-s1 kernel: May 01 23:42:36 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#34 stuck for 22s! [mdt_io02_034:102984] May 01 23:42:36 fir-md1-s1 kernel: Modules linked in: May 01 23:42:36 fir-md1-s1 kernel: osp(OE) May 01 23:42:36 fir-md1-s1 kernel: mdd(OE) May 01 23:42:36 fir-md1-s1 kernel: lod(OE) May 01 23:42:36 fir-md1-s1 kernel: mdt(OE) May 01 23:42:36 fir-md1-s1 kernel: lfsck(OE) May 01 23:42:36 fir-md1-s1 kernel: mgs(OE) May 01 23:42:36 fir-md1-s1 kernel: mgc(OE) May 01 23:42:36 fir-md1-s1 kernel: osd_ldiskfs(OE) May 01 23:42:36 fir-md1-s1 kernel: lquota(OE) May 01 23:42:36 fir-md1-s1 kernel: ldiskfs(OE) May 01 23:42:36 fir-md1-s1 kernel: lustre(OE) May 01 23:42:36 fir-md1-s1 kernel: lmv(OE) May 01 23:42:36 fir-md1-s1 kernel: mdc(OE) May 01 23:42:36 fir-md1-s1 kernel: osc(OE) May 01 23:42:36 fir-md1-s1 kernel: lov(OE) May 01 23:42:36 fir-md1-s1 kernel: fid(OE) May 01 23:42:36 fir-md1-s1 kernel: fld(OE) May 01 23:42:36 fir-md1-s1 kernel: ko2iblnd(OE) May 01 23:42:36 fir-md1-s1 kernel: ptlrpc(OE) May 01 23:42:36 fir-md1-s1 kernel: obdclass(OE) May 01 23:42:36 fir-md1-s1 kernel: lnet(OE) May 01 23:42:36 fir-md1-s1 kernel: libcfs(OE) May 01 23:42:36 fir-md1-s1 kernel: rpcsec_gss_krb5 May 01 23:42:36 fir-md1-s1 kernel: auth_rpcgss May 01 23:42:36 fir-md1-s1 kernel: nfsv4 May 01 23:42:36 fir-md1-s1 kernel: dns_resolver May 01 23:42:36 fir-md1-s1 kernel: nfs May 01 23:42:36 fir-md1-s1 kernel: lockd May 01 23:42:36 fir-md1-s1 kernel: grace May 01 23:42:36 fir-md1-s1 kernel: fscache May 01 23:42:36 fir-md1-s1 kernel: rdma_ucm(OE) May 01 23:42:36 fir-md1-s1 kernel: ib_ucm(OE) May 01 23:42:36 fir-md1-s1 kernel: rdma_cm(OE) May 01 23:42:36 fir-md1-s1 kernel: iw_cm(OE) May 01 23:42:36 fir-md1-s1 kernel: ib_ipoib(OE) May 01 23:42:36 fir-md1-s1 kernel: ib_cm(OE) May 01 23:42:36 fir-md1-s1 kernel: ib_umad(OE) May 01 23:42:36 fir-md1-s1 kernel: mlx5_fpga_tools(OE) May 01 23:42:36 fir-md1-s1 kernel: mlx4_en(OE) May 01 23:42:36 fir-md1-s1 kernel: mlx4_ib(OE) May 01 23:42:36 fir-md1-s1 kernel: mlx4_core(OE) May 01 23:42:36 fir-md1-s1 kernel: dell_rbu May 01 23:42:36 fir-md1-s1 kernel: sunrpc May 01 23:42:36 fir-md1-s1 kernel: vfat May 01 23:42:36 fir-md1-s1 kernel: fat May 01 23:42:36 fir-md1-s1 kernel: dm_round_robin May 01 23:42:36 fir-md1-s1 kernel: amd64_edac_mod May 01 23:42:36 fir-md1-s1 kernel: edac_mce_amd May 01 23:42:36 fir-md1-s1 kernel: kvm_amd May 01 23:42:36 fir-md1-s1 kernel: kvm May 01 23:42:36 fir-md1-s1 kernel: ses May 01 23:42:36 fir-md1-s1 kernel: irqbypass May 01 23:42:36 fir-md1-s1 kernel: crc32_pclmul May 01 23:42:36 fir-md1-s1 kernel: enclosure May 01 23:42:36 fir-md1-s1 kernel: ghash_clmulni_intel May 01 23:42:36 fir-md1-s1 kernel: dcdbas May 01 23:42:36 fir-md1-s1 kernel: aesni_intel May 01 23:42:36 fir-md1-s1 kernel: lrw May 01 23:42:36 fir-md1-s1 kernel: gf128mul May 01 23:42:36 fir-md1-s1 kernel: glue_helper May 01 23:42:36 fir-md1-s1 kernel: ablk_helper May 01 23:42:36 fir-md1-s1 kernel: cryptd May 01 23:42:36 fir-md1-s1 kernel: ipmi_si May 01 23:42:36 fir-md1-s1 kernel: pcspkr May 01 23:42:36 fir-md1-s1 kernel: ipmi_devintf May 01 23:42:36 fir-md1-s1 kernel: ccp May 01 23:42:36 fir-md1-s1 kernel: i2c_piix4 May 01 23:42:36 fir-md1-s1 kernel: dm_multipath May 01 23:42:36 fir-md1-s1 kernel: sg May 01 23:42:36 fir-md1-s1 kernel: k10temp May 01 23:42:36 fir-md1-s1 kernel: ipmi_msghandler May 01 23:42:36 fir-md1-s1 kernel: dm_mod May 01 23:42:36 fir-md1-s1 kernel: acpi_power_meter May 01 23:42:36 fir-md1-s1 kernel: knem(OE) May 01 23:42:36 fir-md1-s1 kernel: ip_tables May 01 23:42:36 fir-md1-s1 kernel: ext4 May 01 23:42:36 fir-md1-s1 kernel: mbcache May 01 23:42:36 fir-md1-s1 kernel: jbd2 May 01 23:42:36 fir-md1-s1 kernel: sd_mod May 01 23:42:36 fir-md1-s1 kernel: crc_t10dif May 01 23:42:36 fir-md1-s1 kernel: crct10dif_generic May 01 23:42:36 fir-md1-s1 kernel: mlx5_ib(OE) May 01 23:42:36 fir-md1-s1 kernel: ib_uverbs(OE) May 01 23:42:36 fir-md1-s1 kernel: ib_core(OE) May 01 23:42:36 fir-md1-s1 kernel: i2c_algo_bit May 01 23:42:36 fir-md1-s1 kernel: drm_kms_helper May 01 23:42:36 fir-md1-s1 kernel: mlx5_core(OE) May 01 23:42:36 fir-md1-s1 kernel: syscopyarea May 01 23:42:36 fir-md1-s1 kernel: sysfillrect May 01 23:42:36 fir-md1-s1 kernel: sysimgblt May 01 23:42:36 fir-md1-s1 kernel: fb_sys_fops May 01 23:42:36 fir-md1-s1 kernel: mlxfw(OE) May 01 23:42:36 fir-md1-s1 kernel: crct10dif_pclmul May 01 23:42:36 fir-md1-s1 kernel: ttm May 01 23:42:36 fir-md1-s1 kernel: devlink May 01 23:42:36 fir-md1-s1 kernel: ahci May 01 23:42:36 fir-md1-s1 kernel: crct10dif_common May 01 23:42:36 fir-md1-s1 kernel: libahci May 01 23:42:36 fir-md1-s1 kernel: drm May 01 23:42:36 fir-md1-s1 kernel: mlx_compat(OE) May 01 23:42:36 fir-md1-s1 kernel: tg3 May 01 23:42:36 fir-md1-s1 kernel: crc32c_intel May 01 23:42:36 fir-md1-s1 kernel: libata May 01 23:42:36 fir-md1-s1 kernel: megaraid_sas May 01 23:42:36 fir-md1-s1 kernel: drm_panel_orientation_quirks May 01 23:42:36 fir-md1-s1 kernel: ptp May 01 23:42:36 fir-md1-s1 kernel: pps_core May 01 23:42:36 fir-md1-s1 kernel: mpt3sas(OE) May 01 23:42:36 fir-md1-s1 kernel: raid_class May 01 23:42:36 fir-md1-s1 kernel: scsi_transport_sas May 01 23:42:36 fir-md1-s1 kernel: [last unloaded: libcfs] May 01 23:42:36 fir-md1-s1 kernel: May 01 23:42:36 fir-md1-s1 kernel: CPU: 34 PID: 102984 Comm: mdt_io02_034 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 May 01 23:42:36 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 May 01 23:42:36 fir-md1-s1 kernel: task: ffff982cf9de4100 ti: ffff985ce80d4000 task.ti: ffff985ce80d4000 May 01 23:42:36 fir-md1-s1 kernel: RIP: 0010:[] May 01 23:42:36 fir-md1-s1 kernel: [] native_queued_spin_lock_slowpath+0x122/0x200 May 01 23:42:36 fir-md1-s1 kernel: RSP: 0018:ffff985ce80d7800 EFLAGS: 00000246 May 01 23:42:36 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffff983165105698 RCX: 0000000001110000 May 01 23:42:36 fir-md1-s1 kernel: RDX: ffff983cff79b780 RSI: 0000000000c90101 RDI: ffff982c9fc8c480 May 01 23:42:36 fir-md1-s1 kernel: RBP: ffff985ce80d7800 R08: ffff984cff81b780 R09: 0000000000000000 May 01 23:42:36 fir-md1-s1 kernel: R10: ffff984cff81f140 R11: ffffde3fa7770c00 R12: 0000000000000000 May 01 23:42:36 fir-md1-s1 kernel: R13: ffff985ce80d77a0 R14: ffff983165105408 R15: 0000000000000000 May 01 23:42:36 fir-md1-s1 kernel: FS: 00007fe19c902740(0000) GS:ffff984cff800000(0000) knlGS:0000000000000000 May 01 23:42:36 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 01 23:42:36 fir-md1-s1 kernel: CR2: 00007fe19bb327c0 CR3: 00000012f7610000 CR4: 00000000003407e0 May 01 23:42:36 fir-md1-s1 kernel: Call Trace: May 01 23:42:36 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf May 01 23:42:36 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 May 01 23:42:36 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] May 01 23:42:36 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] May 01 23:42:36 fir-md1-s1 kernel: [] ? ktime_get+0x52/0xe0 May 01 23:42:36 fir-md1-s1 kernel: [] ? kiblnd_check_sends_locked+0xa72/0xe40 [ko2iblnd] May 01 23:42:36 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] May 01 23:42:36 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 May 01 23:42:36 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] May 01 23:42:36 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] May 01 23:42:36 fir-md1-s1 kernel: [] mdt_obd_preprw+0x637/0x1060 [mdt] May 01 23:42:36 fir-md1-s1 kernel: [] tgt_brw_write+0xc7e/0x1a90 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? tgt_free_reply_data+0x128/0x3b0 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? kfree+0x106/0x140 May 01 23:42:36 fir-md1-s1 kernel: [] ? tgt_free_reply_data+0x128/0x3b0 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] May 01 23:42:36 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? default_wake_function+0x12/0x20 May 01 23:42:36 fir-md1-s1 kernel: [] ? __wake_up_common+0x5b/0x90 May 01 23:42:36 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 May 01 23:42:36 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:42:36 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 May 01 23:42:36 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:42:36 fir-md1-s1 kernel: Code: May 01 23:42:36 fir-md1-s1 kernel: 13 May 01 23:42:36 fir-md1-s1 kernel: 48 May 01 23:42:36 fir-md1-s1 kernel: c1 May 01 23:42:36 fir-md1-s1 kernel: ea May 01 23:42:36 fir-md1-s1 kernel: 0d May 01 23:42:36 fir-md1-s1 kernel: 48 May 01 23:42:36 fir-md1-s1 kernel: 98 May 01 23:42:36 fir-md1-s1 kernel: 83 May 01 23:42:36 fir-md1-s1 kernel: e2 May 01 23:42:36 fir-md1-s1 kernel: 30 May 01 23:42:36 fir-md1-s1 kernel: 48 May 01 23:42:36 fir-md1-s1 kernel: 81 May 01 23:42:36 fir-md1-s1 kernel: c2 May 01 23:42:36 fir-md1-s1 kernel: 80 May 01 23:42:36 fir-md1-s1 kernel: b7 May 01 23:42:36 fir-md1-s1 kernel: 01 May 01 23:42:36 fir-md1-s1 kernel: 00 May 01 23:42:36 fir-md1-s1 kernel: 48 May 01 23:42:36 fir-md1-s1 kernel: 03 May 01 23:42:36 fir-md1-s1 kernel: 14 May 01 23:42:36 fir-md1-s1 kernel: c5 May 01 23:42:36 fir-md1-s1 kernel: 60 May 01 23:42:36 fir-md1-s1 kernel: b9 May 01 23:42:36 fir-md1-s1 kernel: b4 May 01 23:42:36 fir-md1-s1 kernel: b7 May 01 23:42:36 fir-md1-s1 kernel: 4c May 01 23:42:36 fir-md1-s1 kernel: 89 May 01 23:42:36 fir-md1-s1 kernel: 02 May 01 23:42:36 fir-md1-s1 kernel: 41 May 01 23:42:36 fir-md1-s1 kernel: 8b May 01 23:42:36 fir-md1-s1 kernel: 40 May 01 23:42:36 fir-md1-s1 kernel: 08 May 01 23:42:36 fir-md1-s1 kernel: 85 May 01 23:42:36 fir-md1-s1 kernel: c0 May 01 23:42:36 fir-md1-s1 kernel: 75 May 01 23:42:36 fir-md1-s1 kernel: 0f May 01 23:42:36 fir-md1-s1 kernel: 0f May 01 23:42:36 fir-md1-s1 kernel: 1f May 01 23:42:36 fir-md1-s1 kernel: 44 May 01 23:42:36 fir-md1-s1 kernel: 00 May 01 23:42:36 fir-md1-s1 kernel: 00 May 01 23:42:36 fir-md1-s1 kernel: f3 May 01 23:42:36 fir-md1-s1 kernel: 90 May 01 23:42:36 fir-md1-s1 kernel: <41> May 01 23:42:36 fir-md1-s1 kernel: 8b May 01 23:42:36 fir-md1-s1 kernel: 40 May 01 23:42:36 fir-md1-s1 kernel: 08 May 01 23:42:36 fir-md1-s1 kernel: 85 May 01 23:42:36 fir-md1-s1 kernel: c0 May 01 23:42:36 fir-md1-s1 kernel: 74 May 01 23:42:36 fir-md1-s1 kernel: f6 May 01 23:42:36 fir-md1-s1 kernel: 4d May 01 23:42:36 fir-md1-s1 kernel: 8b May 01 23:42:36 fir-md1-s1 kernel: 08 May 01 23:42:36 fir-md1-s1 kernel: 4d May 01 23:42:36 fir-md1-s1 kernel: 85 May 01 23:42:36 fir-md1-s1 kernel: c9 May 01 23:42:36 fir-md1-s1 kernel: 74 May 01 23:42:36 fir-md1-s1 kernel: 04 May 01 23:42:36 fir-md1-s1 kernel: 41 May 01 23:42:36 fir-md1-s1 kernel: 0f May 01 23:42:36 fir-md1-s1 kernel: 18 May 01 23:42:36 fir-md1-s1 kernel: 09 May 01 23:42:36 fir-md1-s1 kernel: 8b May 01 23:42:36 fir-md1-s1 kernel: May 01 23:42:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 3ddfc0e1-d9a8-93ac-6e7d-3e2edb9b897f (at 10.8.0.65@o2ib6) reconnecting May 01 23:42:36 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages May 01 23:42:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.8.0.65@o2ib6) May 01 23:42:36 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages May 01 23:42:36 fir-md1-s1 kernel: mlx5_fpga_tools(OE) May 01 23:42:36 fir-md1-s1 kernel: mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm ses irqbypass crc32_pclmul enclosure ghash_clmulni_intel dcdbas aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ipmi_si pcspkr ipmi_devintf ccp i2c_piix4 dm_multipath sg k10temp ipmi_msghandler dm_mod acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper mlx5_core(OE) syscopyarea sysfillrect sysimgblt fb_sys_fops mlxfw(OE) crct10dif_pclmul ttm devlink ahci crct10dif_common libahci drm mlx_compat(OE) tg3 crc32c_intel libata megaraid_sas drm_panel_orientation_quirks ptp pps_core mpt3sas(OE) raid_class scsi_transport_sas [last unloaded: libcfs] May 01 23:42:36 fir-md1-s1 kernel: CPU: 4 PID: 103101 Comm: mdt_io00_057 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 May 01 23:42:36 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 May 01 23:42:36 fir-md1-s1 kernel: task: ffff985c827130c0 ti: ffff985c1a30c000 task.ti: ffff985c1a30c000 May 01 23:42:36 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x1ce/0x200 May 01 23:42:36 fir-md1-s1 kernel: RSP: 0018:ffff985c1a30f800 EFLAGS: 00000202 May 01 23:42:36 fir-md1-s1 kernel: RAX: 0000000000000001 RBX: ffff9831703ce738 RCX: 0000000000000001 May 01 23:42:36 fir-md1-s1 kernel: RDX: 0000000000000101 RSI: 0000000000000001 RDI: ffff982c9fc8c480 May 01 23:42:36 fir-md1-s1 kernel: RBP: ffff985c1a30f800 R08: 0000000000000101 R09: ffffffffc1231d1a May 01 23:42:36 fir-md1-s1 kernel: R10: ffff982cfee5f140 R11: ffffde3ed5b1ce00 R12: 0000000000000000 May 01 23:42:36 fir-md1-s1 kernel: R13: ffff985c1a30f7a0 R14: ffff9831703ce4a8 R15: 0000000000000000 May 01 23:42:36 fir-md1-s1 kernel: FS: 00007f427f792740(0000) GS:ffff982cfee40000(0000) knlGS:0000000000000000 May 01 23:42:36 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 01 23:42:36 fir-md1-s1 kernel: CR2: 00007f427f58b000 CR3: 00000012f7610000 CR4: 00000000003407e0 May 01 23:42:36 fir-md1-s1 kernel: Call Trace: May 01 23:42:36 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf May 01 23:42:36 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 May 01 23:42:36 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] May 01 23:42:36 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] May 01 23:42:36 fir-md1-s1 kernel: [] ? ktime_get+0x52/0xe0 May 01 23:42:36 fir-md1-s1 kernel: [] ? kiblnd_check_sends_locked+0xa72/0xe40 [ko2iblnd] May 01 23:42:36 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] May 01 23:42:36 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 May 01 23:42:36 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] May 01 23:42:36 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] May 01 23:42:36 fir-md1-s1 kernel: [] mdt_obd_preprw+0x637/0x1060 [mdt] May 01 23:42:36 fir-md1-s1 kernel: [] tgt_brw_write+0xc7e/0x1a90 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? lustre_msg_buf_v2+0x1b0/0x1b0 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? lustre_msg_buf+0x17/0x60 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? update_curr+0x14c/0x1e0 May 01 23:42:36 fir-md1-s1 kernel: [] ? account_entity_dequeue+0xae/0xd0 May 01 23:42:36 fir-md1-s1 kernel: [] ? __enqueue_entity+0x78/0x80 May 01 23:42:36 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] May 01 23:42:36 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? default_wake_function+0x12/0x20 May 01 23:42:36 fir-md1-s1 kernel: [] ? __wake_up_common+0x5b/0x90 May 01 23:42:36 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] May 01 23:42:36 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 May 01 23:42:36 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:42:36 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 May 01 23:42:36 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:42:36 fir-md1-s1 kernel: Code: 37 81 fe 00 01 00 00 74 f4 e9 93 fe ff ff 0f 1f 80 00 00 00 00 83 fa 01 75 11 0f 1f 00 e9 68 fe ff ff 0f 1f 00 85 c0 74 0c f3 90 <8b> 07 0f b6 c0 83 f8 03 75 f0 b8 01 00 00 00 66 89 07 5d c3 66 May 01 23:42:38 fir-md1-s1 kernel: Lustre: 102998:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff984bdbe48c50 x1631547606839792/t0(0) o4->778be52b-e90c-a9e1-8d5b-9e961e103e4e@10.9.101.5@o2ib4:13/0 lens 6328/448 e 1 to 0 dl 1556779363 ref 2 fl Interpret:/0/0 rc 0/0 May 01 23:42:38 fir-md1-s1 kernel: Lustre: 102998:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 60 previous similar messages May 01 23:42:41 fir-md1-s1 kernel: Lustre: 102963:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:1s); client may timeout. req@ffff98269b6d3850 x1631550680416096/t279448167600(0) o4->da8b20c2-1617-543d-bea7-2bc4da319abc@10.9.112.7@o2ib4:10/0 lens 488/416 e 0 to 0 dl 1556779360 ref 1 fl Complete:/0/0 rc 0/0 May 01 23:42:41 fir-md1-s1 kernel: Lustre: 101380:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556779340/real 1556779340] req@ffff98471df1fb00 x1632254603991360/t0(0) o101->fir-MDT0000-lwp-MDT0002@0@lo:23/10 lens 456/496 e 1 to 1 dl 1556779361 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 May 01 23:42:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client fd395c74-7a26-632a-06a8-cecc6aa8caa0 (at 10.9.105.64@o2ib4) reconnecting May 01 23:42:43 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages May 01 23:42:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to cb9e5693-7c44-f40c-eac0-9f9482ccd7f6 (at 10.9.105.64@o2ib4) May 01 23:42:43 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages May 01 23:42:44 fir-md1-s1 kernel: Lustre: 103078:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:1s); client may timeout. req@ffff983816f31050 x1631541998329168/t279448167813(0) o4->d4242da5-5a9c-4508-f9da-c1e7f36347f4@10.9.114.4@o2ib4:13/0 lens 488/416 e 0 to 0 dl 1556779363 ref 1 fl Complete:/0/0 rc 0/0 May 01 23:42:50 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#26 stuck for 23s! [mdt_io02_043:103027] May 01 23:42:50 fir-md1-s1 kernel: Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm ses irqbypass crc32_pclmul enclosure ghash_clmulni_intel dcdbas aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ipmi_si pcspkr ipmi_devintf ccp i2c_piix4 dm_multipath sg k10temp ipmi_msghandler dm_mod acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif May 01 23:42:50 fir-md1-s1 kernel: crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper mlx5_core(OE) syscopyarea sysfillrect sysimgblt fb_sys_fops mlxfw(OE) crct10dif_pclmul ttm devlink ahci crct10dif_common libahci drm mlx_compat(OE) tg3 crc32c_intel libata megaraid_sas drm_panel_orientation_quirks ptp pps_core mpt3sas(OE) raid_class scsi_transport_sas [last unloaded: libcfs] May 01 23:42:50 fir-md1-s1 kernel: CPU: 26 PID: 103027 Comm: mdt_io02_043 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 May 01 23:42:50 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 May 01 23:42:50 fir-md1-s1 kernel: task: ffff982c812730c0 ti: ffff983a5ba80000 task.ti: ffff983a5ba80000 May 01 23:42:50 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x126/0x200 May 01 23:42:50 fir-md1-s1 kernel: RSP: 0018:ffff983a5ba83800 EFLAGS: 00000246 May 01 23:42:50 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffff983165526b60 RCX: 0000000000d10000 May 01 23:42:50 fir-md1-s1 kernel: RDX: ffff982cfeedb780 RSI: 0000000000610101 RDI: ffff982c9fc8c480 May 01 23:42:50 fir-md1-s1 kernel: RBP: ffff983a5ba83800 R08: ffff984cff79b780 R09: 0000000000000000 May 01 23:42:50 fir-md1-s1 kernel: R10: ffff984cff79f140 R11: ffffde3f6b96dc00 R12: 0000000000000000 May 01 23:42:50 fir-md1-s1 kernel: R13: ffff983a5ba837a0 R14: ffff9831655268d0 R15: 0000000000000000 May 01 23:42:50 fir-md1-s1 kernel: FS: 00007fa424097780(0000) GS:ffff984cff780000(0000) knlGS:0000000000000000 May 01 23:42:50 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 01 23:42:50 fir-md1-s1 kernel: CR2: 00007fa4240a8000 CR3: 00000012f7610000 CR4: 00000000003407e0 May 01 23:42:50 fir-md1-s1 kernel: Call Trace: May 01 23:42:50 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf May 01 23:42:50 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 May 01 23:42:50 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] May 01 23:42:50 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] May 01 23:42:50 fir-md1-s1 kernel: [] ? ktime_get+0x52/0xe0 May 01 23:42:50 fir-md1-s1 kernel: [] ? kiblnd_check_sends_locked+0xa72/0xe40 [ko2iblnd] May 01 23:42:50 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] May 01 23:42:50 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 May 01 23:42:50 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] May 01 23:42:50 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] May 01 23:42:50 fir-md1-s1 kernel: [] mdt_obd_preprw+0x637/0x1060 [mdt] May 01 23:42:50 fir-md1-s1 kernel: [] tgt_brw_write+0xc7e/0x1a90 [ptlrpc] May 01 23:42:50 fir-md1-s1 kernel: [] ? lustre_msg_buf_v2+0x1b0/0x1b0 [ptlrpc] May 01 23:42:50 fir-md1-s1 kernel: [] ? lustre_msg_buf+0x17/0x60 [ptlrpc] May 01 23:42:50 fir-md1-s1 kernel: [] ? update_curr+0x14c/0x1e0 May 01 23:42:50 fir-md1-s1 kernel: [] ? account_entity_dequeue+0xae/0xd0 May 01 23:42:50 fir-md1-s1 kernel: [] ? __enqueue_entity+0x78/0x80 May 01 23:42:50 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] May 01 23:42:50 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] May 01 23:42:50 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] May 01 23:42:50 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] May 01 23:42:50 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 01 23:42:50 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] May 01 23:42:50 fir-md1-s1 kernel: [] ? default_wake_function+0x12/0x20 May 01 23:42:50 fir-md1-s1 kernel: [] ? __wake_up_common+0x5b/0x90 May 01 23:42:50 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] May 01 23:42:50 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] May 01 23:42:50 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 May 01 23:42:50 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:42:50 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 May 01 23:42:50 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:42:50 fir-md1-s1 kernel: Code: 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 60 b9 b4 b7 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 41 8b 40 08 <85> c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2 May 01 23:42:55 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [mdt_io02_001:101733] May 01 23:42:55 fir-md1-s1 kernel: Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm ses irqbypass crc32_pclmul enclosure ghash_clmulni_intel dcdbas aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ipmi_si pcspkr ipmi_devintf ccp i2c_piix4 dm_multipath sg k10temp ipmi_msghandler dm_mod acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif May 01 23:42:55 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#15 stuck for 22s! [mdt_io03_043:103125] May 01 23:42:55 fir-md1-s1 kernel: crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper mlx5_core(OE) syscopyarea sysfillrect sysimgblt fb_sys_fops mlxfw(OE) crct10dif_pclmul ttm devlink ahci crct10dif_common libahci drm mlx_compat(OE) tg3 crc32c_intel libata megaraid_sas drm_panel_orientation_quirks ptp pps_core mpt3sas(OE) raid_class scsi_transport_sas May 01 23:42:55 fir-md1-s1 kernel: Modules linked in: May 01 23:42:55 fir-md1-s1 kernel: osp(OE) May 01 23:42:55 fir-md1-s1 kernel: mdd(OE) May 01 23:42:55 fir-md1-s1 kernel: lod(OE) May 01 23:42:55 fir-md1-s1 kernel: mdt(OE) May 01 23:42:55 fir-md1-s1 kernel: lfsck(OE) May 01 23:42:55 fir-md1-s1 kernel: mgs(OE) May 01 23:42:55 fir-md1-s1 kernel: mgc(OE) May 01 23:42:55 fir-md1-s1 kernel: osd_ldiskfs(OE) May 01 23:42:55 fir-md1-s1 kernel: lquota(OE) May 01 23:42:55 fir-md1-s1 kernel: ldiskfs(OE) May 01 23:42:55 fir-md1-s1 kernel: lustre(OE) May 01 23:42:55 fir-md1-s1 kernel: lmv(OE) May 01 23:42:55 fir-md1-s1 kernel: mdc(OE) May 01 23:42:55 fir-md1-s1 kernel: osc(OE) May 01 23:42:55 fir-md1-s1 kernel: lov(OE) May 01 23:42:55 fir-md1-s1 kernel: fid(OE) May 01 23:42:55 fir-md1-s1 kernel: fld(OE) May 01 23:42:55 fir-md1-s1 kernel: ko2iblnd(OE) May 01 23:42:55 fir-md1-s1 kernel: ptlrpc(OE) May 01 23:42:55 fir-md1-s1 kernel: obdclass(OE) May 01 23:42:55 fir-md1-s1 kernel: lnet(OE) May 01 23:42:55 fir-md1-s1 kernel: libcfs(OE) May 01 23:42:55 fir-md1-s1 kernel: rpcsec_gss_krb5 May 01 23:42:55 fir-md1-s1 kernel: auth_rpcgss May 01 23:42:55 fir-md1-s1 kernel: nfsv4 May 01 23:42:55 fir-md1-s1 kernel: dns_resolver May 01 23:42:55 fir-md1-s1 kernel: nfs May 01 23:42:55 fir-md1-s1 kernel: lockd May 01 23:42:55 fir-md1-s1 kernel: grace May 01 23:42:55 fir-md1-s1 kernel: fscache May 01 23:42:55 fir-md1-s1 kernel: rdma_ucm(OE) May 01 23:42:55 fir-md1-s1 kernel: ib_ucm(OE) May 01 23:42:55 fir-md1-s1 kernel: rdma_cm(OE) May 01 23:42:55 fir-md1-s1 kernel: iw_cm(OE) May 01 23:42:55 fir-md1-s1 kernel: ib_ipoib(OE) May 01 23:42:55 fir-md1-s1 kernel: ib_cm(OE) May 01 23:42:55 fir-md1-s1 kernel: ib_umad(OE) May 01 23:42:55 fir-md1-s1 kernel: mlx5_fpga_tools(OE) May 01 23:42:55 fir-md1-s1 kernel: mlx4_en(OE) May 01 23:42:55 fir-md1-s1 kernel: mlx4_ib(OE) May 01 23:42:55 fir-md1-s1 kernel: mlx4_core(OE) May 01 23:42:55 fir-md1-s1 kernel: dell_rbu May 01 23:42:55 fir-md1-s1 kernel: sunrpc May 01 23:42:55 fir-md1-s1 kernel: vfat May 01 23:42:55 fir-md1-s1 kernel: fat May 01 23:42:55 fir-md1-s1 kernel: dm_round_robin May 01 23:42:55 fir-md1-s1 kernel: amd64_edac_mod May 01 23:42:55 fir-md1-s1 kernel: edac_mce_amd May 01 23:42:55 fir-md1-s1 kernel: kvm_amd May 01 23:42:55 fir-md1-s1 kernel: kvm May 01 23:42:55 fir-md1-s1 kernel: ses May 01 23:42:55 fir-md1-s1 kernel: irqbypass May 01 23:42:55 fir-md1-s1 kernel: crc32_pclmul May 01 23:42:55 fir-md1-s1 kernel: enclosure May 01 23:42:55 fir-md1-s1 kernel: ghash_clmulni_intel May 01 23:42:55 fir-md1-s1 kernel: dcdbas May 01 23:42:55 fir-md1-s1 kernel: aesni_intel May 01 23:42:55 fir-md1-s1 kernel: lrw May 01 23:42:55 fir-md1-s1 kernel: gf128mul May 01 23:42:55 fir-md1-s1 kernel: glue_helper May 01 23:42:55 fir-md1-s1 kernel: ablk_helper May 01 23:42:55 fir-md1-s1 kernel: cryptd May 01 23:42:55 fir-md1-s1 kernel: ipmi_si May 01 23:42:55 fir-md1-s1 kernel: pcspkr May 01 23:42:55 fir-md1-s1 kernel: ipmi_devintf May 01 23:42:55 fir-md1-s1 kernel: ccp May 01 23:42:55 fir-md1-s1 kernel: i2c_piix4 May 01 23:42:55 fir-md1-s1 kernel: dm_multipath May 01 23:42:55 fir-md1-s1 kernel: sg May 01 23:42:55 fir-md1-s1 kernel: k10temp May 01 23:42:55 fir-md1-s1 kernel: ipmi_msghandler May 01 23:42:55 fir-md1-s1 kernel: dm_mod May 01 23:42:55 fir-md1-s1 kernel: acpi_power_meter May 01 23:42:55 fir-md1-s1 kernel: knem(OE) May 01 23:42:55 fir-md1-s1 kernel: ip_tables May 01 23:42:55 fir-md1-s1 kernel: ext4 May 01 23:42:55 fir-md1-s1 kernel: mbcache May 01 23:42:55 fir-md1-s1 kernel: jbd2 May 01 23:42:55 fir-md1-s1 kernel: sd_mod May 01 23:42:55 fir-md1-s1 kernel: crc_t10dif May 01 23:42:55 fir-md1-s1 kernel: crct10dif_generic May 01 23:42:55 fir-md1-s1 kernel: mlx5_ib(OE) May 01 23:42:55 fir-md1-s1 kernel: ib_uverbs(OE) May 01 23:42:55 fir-md1-s1 kernel: ib_core(OE) May 01 23:42:55 fir-md1-s1 kernel: i2c_algo_bit May 01 23:42:55 fir-md1-s1 kernel: drm_kms_helper May 01 23:42:55 fir-md1-s1 kernel: mlx5_core(OE) May 01 23:42:55 fir-md1-s1 kernel: syscopyarea May 01 23:42:55 fir-md1-s1 kernel: sysfillrect May 01 23:42:55 fir-md1-s1 kernel: sysimgblt May 01 23:42:55 fir-md1-s1 kernel: fb_sys_fops May 01 23:42:55 fir-md1-s1 kernel: mlxfw(OE) May 01 23:42:55 fir-md1-s1 kernel: crct10dif_pclmul May 01 23:42:55 fir-md1-s1 kernel: ttm May 01 23:42:55 fir-md1-s1 kernel: devlink May 01 23:42:55 fir-md1-s1 kernel: ahci May 01 23:42:55 fir-md1-s1 kernel: crct10dif_common May 01 23:42:55 fir-md1-s1 kernel: libahci May 01 23:42:55 fir-md1-s1 kernel: drm May 01 23:42:55 fir-md1-s1 kernel: mlx_compat(OE) May 01 23:42:55 fir-md1-s1 kernel: tg3 May 01 23:42:55 fir-md1-s1 kernel: crc32c_intel May 01 23:42:55 fir-md1-s1 kernel: libata May 01 23:42:55 fir-md1-s1 kernel: megaraid_sas May 01 23:42:55 fir-md1-s1 kernel: drm_panel_orientation_quirks May 01 23:42:55 fir-md1-s1 kernel: ptp May 01 23:42:55 fir-md1-s1 kernel: pps_core May 01 23:42:55 fir-md1-s1 kernel: mpt3sas(OE) May 01 23:42:55 fir-md1-s1 kernel: raid_class May 01 23:42:55 fir-md1-s1 kernel: scsi_transport_sas May 01 23:42:55 fir-md1-s1 kernel: [last unloaded: libcfs] May 01 23:42:55 fir-md1-s1 kernel: May 01 23:42:55 fir-md1-s1 kernel: CPU: 15 PID: 103125 Comm: mdt_io03_043 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 May 01 23:42:55 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 May 01 23:42:55 fir-md1-s1 kernel: task: ffff985912f64100 ti: ffff9858407d0000 task.ti: ffff9858407d0000 May 01 23:42:55 fir-md1-s1 kernel: RIP: 0010:[] May 01 23:42:55 fir-md1-s1 kernel: [] native_queued_spin_lock_slowpath+0x122/0x200 May 01 23:42:55 fir-md1-s1 kernel: RSP: 0018:ffff9858407d38e8 EFLAGS: 00000246 May 01 23:42:55 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffff984851f7a378 RCX: 0000000000790000 May 01 23:42:55 fir-md1-s1 kernel: RDX: ffff982cff01b780 RSI: 0000000001010101 RDI: ffff982c9fc8c480 May 01 23:42:55 fir-md1-s1 kernel: RBP: ffff9858407d38e8 R08: ffff985d3f4db780 R09: 0000000000000000 May 01 23:42:55 fir-md1-s1 kernel: R10: 0000000000000000 R11: ffff985c8b319038 R12: ffff985d3f55ac00 May 01 23:42:55 fir-md1-s1 kernel: R13: ffff985912f64168 R14: 00ff9858407d3850 R15: ffff984cf34000a0 May 01 23:42:55 fir-md1-s1 kernel: FS: 00007f63c1c68740(0000) GS:ffff985d3f4c0000(0000) knlGS:0000000000000000 May 01 23:42:55 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 01 23:42:55 fir-md1-s1 kernel: CR2: 00007ff884d9fd1c CR3: 00000012f7610000 CR4: 00000000003407e0 May 01 23:42:55 fir-md1-s1 kernel: Call Trace: May 01 23:42:55 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf May 01 23:42:55 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 May 01 23:42:55 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] May 01 23:42:55 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x210/0x700 [ldiskfs] May 01 23:42:55 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 May 01 23:42:55 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] May 01 23:42:55 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] May 01 23:42:55 fir-md1-s1 kernel: [] mdt_obd_preprw+0x637/0x1060 [mdt] May 01 23:42:55 fir-md1-s1 kernel: [] tgt_brw_write+0xc7e/0x1a90 [ptlrpc] May 01 23:42:55 fir-md1-s1 kernel: [] ? load_balance+0x178/0x9a0 May 01 23:42:55 fir-md1-s1 kernel: [] ? update_curr+0x14c/0x1e0 May 01 23:42:55 fir-md1-s1 kernel: [] ? account_entity_dequeue+0xae/0xd0 May 01 23:42:55 fir-md1-s1 kernel: [] ? __enqueue_entity+0x78/0x80 May 01 23:42:55 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] May 01 23:42:55 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] May 01 23:42:55 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] May 01 23:42:55 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] May 01 23:42:55 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 01 23:42:55 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] May 01 23:42:55 fir-md1-s1 kernel: [] ? default_wake_function+0x12/0x20 May 01 23:42:55 fir-md1-s1 kernel: [] ? __wake_up_common+0x5b/0x90 May 01 23:42:55 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] May 01 23:42:55 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] May 01 23:42:55 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 May 01 23:42:55 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:42:55 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 May 01 23:42:55 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:42:55 fir-md1-s1 kernel: Code: May 01 23:42:55 fir-md1-s1 kernel: 13 May 01 23:42:55 fir-md1-s1 kernel: 48 May 01 23:42:55 fir-md1-s1 kernel: c1 May 01 23:42:55 fir-md1-s1 kernel: ea May 01 23:42:55 fir-md1-s1 kernel: 0d May 01 23:42:55 fir-md1-s1 kernel: 48 May 01 23:42:55 fir-md1-s1 kernel: 98 May 01 23:42:55 fir-md1-s1 kernel: 83 May 01 23:42:55 fir-md1-s1 kernel: e2 May 01 23:42:55 fir-md1-s1 kernel: 30 May 01 23:42:55 fir-md1-s1 kernel: 48 May 01 23:42:55 fir-md1-s1 kernel: 81 May 01 23:42:55 fir-md1-s1 kernel: c2 May 01 23:42:55 fir-md1-s1 kernel: 80 May 01 23:42:55 fir-md1-s1 kernel: b7 May 01 23:42:55 fir-md1-s1 kernel: 01 May 01 23:42:55 fir-md1-s1 kernel: 00 May 01 23:42:55 fir-md1-s1 kernel: 48 May 01 23:42:55 fir-md1-s1 kernel: 03 May 01 23:42:55 fir-md1-s1 kernel: 14 May 01 23:42:55 fir-md1-s1 kernel: c5 May 01 23:42:55 fir-md1-s1 kernel: 60 May 01 23:42:55 fir-md1-s1 kernel: b9 May 01 23:42:55 fir-md1-s1 kernel: b4 May 01 23:42:55 fir-md1-s1 kernel: b7 May 01 23:42:55 fir-md1-s1 kernel: 4c May 01 23:42:55 fir-md1-s1 kernel: 89 May 01 23:42:55 fir-md1-s1 kernel: 02 May 01 23:42:55 fir-md1-s1 kernel: 41 May 01 23:42:55 fir-md1-s1 kernel: 8b May 01 23:42:55 fir-md1-s1 kernel: 40 May 01 23:42:55 fir-md1-s1 kernel: 08 May 01 23:42:55 fir-md1-s1 kernel: 85 May 01 23:42:55 fir-md1-s1 kernel: c0 May 01 23:42:55 fir-md1-s1 kernel: 75 May 01 23:42:55 fir-md1-s1 kernel: 0f May 01 23:42:55 fir-md1-s1 kernel: 0f May 01 23:42:55 fir-md1-s1 kernel: 1f May 01 23:42:55 fir-md1-s1 kernel: 44 May 01 23:42:55 fir-md1-s1 kernel: 00 May 01 23:42:55 fir-md1-s1 kernel: 00 May 01 23:42:55 fir-md1-s1 kernel: f3 May 01 23:42:55 fir-md1-s1 kernel: 90 May 01 23:42:55 fir-md1-s1 kernel: <41> May 01 23:42:55 fir-md1-s1 kernel: 8b May 01 23:42:55 fir-md1-s1 kernel: 40 May 01 23:42:55 fir-md1-s1 kernel: 08 May 01 23:42:55 fir-md1-s1 kernel: 85 May 01 23:42:55 fir-md1-s1 kernel: c0 May 01 23:42:55 fir-md1-s1 kernel: 74 May 01 23:42:55 fir-md1-s1 kernel: f6 May 01 23:42:55 fir-md1-s1 kernel: 4d May 01 23:42:55 fir-md1-s1 kernel: 8b May 01 23:42:55 fir-md1-s1 kernel: 08 May 01 23:42:55 fir-md1-s1 kernel: 4d May 01 23:42:55 fir-md1-s1 kernel: 85 May 01 23:42:55 fir-md1-s1 kernel: c9 May 01 23:42:55 fir-md1-s1 kernel: 74 May 01 23:42:55 fir-md1-s1 kernel: 04 May 01 23:42:55 fir-md1-s1 kernel: 41 May 01 23:42:55 fir-md1-s1 kernel: 0f May 01 23:42:55 fir-md1-s1 kernel: 18 May 01 23:42:55 fir-md1-s1 kernel: 09 May 01 23:42:55 fir-md1-s1 kernel: 8b May 01 23:42:55 fir-md1-s1 kernel: May 01 23:42:55 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#21 stuck for 22s! [mdt_io01_085:103094] May 01 23:42:55 fir-md1-s1 kernel: Modules linked in: May 01 23:42:55 fir-md1-s1 kernel: osp(OE) May 01 23:42:55 fir-md1-s1 kernel: mdd(OE) May 01 23:42:55 fir-md1-s1 kernel: lod(OE) May 01 23:42:55 fir-md1-s1 kernel: mdt(OE) May 01 23:42:55 fir-md1-s1 kernel: lfsck(OE) May 01 23:42:55 fir-md1-s1 kernel: mgs(OE) May 01 23:42:55 fir-md1-s1 kernel: mgc(OE) May 01 23:42:55 fir-md1-s1 kernel: osd_ldiskfs(OE) May 01 23:42:55 fir-md1-s1 kernel: lquota(OE) May 01 23:42:55 fir-md1-s1 kernel: ldiskfs(OE) May 01 23:42:55 fir-md1-s1 kernel: lustre(OE) May 01 23:42:55 fir-md1-s1 kernel: lmv(OE) May 01 23:42:55 fir-md1-s1 kernel: mdc(OE) May 01 23:42:55 fir-md1-s1 kernel: osc(OE) May 01 23:42:55 fir-md1-s1 kernel: lov(OE) May 01 23:42:55 fir-md1-s1 kernel: fid(OE) May 01 23:42:55 fir-md1-s1 kernel: fld(OE) May 01 23:42:55 fir-md1-s1 kernel: ko2iblnd(OE) May 01 23:42:55 fir-md1-s1 kernel: ptlrpc(OE) May 01 23:42:55 fir-md1-s1 kernel: obdclass(OE) May 01 23:42:55 fir-md1-s1 kernel: lnet(OE) May 01 23:42:55 fir-md1-s1 kernel: libcfs(OE) May 01 23:42:55 fir-md1-s1 kernel: rpcsec_gss_krb5 May 01 23:42:55 fir-md1-s1 kernel: auth_rpcgss May 01 23:42:55 fir-md1-s1 kernel: nfsv4 May 01 23:42:55 fir-md1-s1 kernel: dns_resolver May 01 23:42:55 fir-md1-s1 kernel: nfs May 01 23:42:55 fir-md1-s1 kernel: lockd May 01 23:42:55 fir-md1-s1 kernel: grace May 01 23:42:55 fir-md1-s1 kernel: fscache May 01 23:42:55 fir-md1-s1 kernel: rdma_ucm(OE) May 01 23:42:55 fir-md1-s1 kernel: ib_ucm(OE) May 01 23:42:55 fir-md1-s1 kernel: rdma_cm(OE) May 01 23:42:55 fir-md1-s1 kernel: iw_cm(OE) May 01 23:42:55 fir-md1-s1 kernel: ib_ipoib(OE) May 01 23:42:55 fir-md1-s1 kernel: ib_cm(OE) May 01 23:42:55 fir-md1-s1 kernel: ib_umad(OE) May 01 23:42:55 fir-md1-s1 kernel: mlx5_fpga_tools(OE) May 01 23:42:55 fir-md1-s1 kernel: mlx4_en(OE) May 01 23:42:55 fir-md1-s1 kernel: mlx4_ib(OE) May 01 23:42:55 fir-md1-s1 kernel: mlx4_core(OE) May 01 23:42:55 fir-md1-s1 kernel: dell_rbu May 01 23:42:55 fir-md1-s1 kernel: sunrpc May 01 23:42:55 fir-md1-s1 kernel: vfat May 01 23:42:55 fir-md1-s1 kernel: fat May 01 23:42:55 fir-md1-s1 kernel: dm_round_robin May 01 23:42:55 fir-md1-s1 kernel: amd64_edac_mod May 01 23:42:55 fir-md1-s1 kernel: edac_mce_amd May 01 23:42:55 fir-md1-s1 kernel: kvm_amd May 01 23:42:55 fir-md1-s1 kernel: kvm May 01 23:42:55 fir-md1-s1 kernel: ses May 01 23:42:55 fir-md1-s1 kernel: irqbypass May 01 23:42:55 fir-md1-s1 kernel: crc32_pclmul May 01 23:42:55 fir-md1-s1 kernel: enclosure May 01 23:42:55 fir-md1-s1 kernel: ghash_clmulni_intel May 01 23:42:55 fir-md1-s1 kernel: dcdbas May 01 23:42:55 fir-md1-s1 kernel: aesni_intel May 01 23:42:55 fir-md1-s1 kernel: lrw May 01 23:42:55 fir-md1-s1 kernel: gf128mul May 01 23:42:55 fir-md1-s1 kernel: glue_helper May 01 23:42:55 fir-md1-s1 kernel: ablk_helper May 01 23:42:55 fir-md1-s1 kernel: cryptd May 01 23:42:55 fir-md1-s1 kernel: ipmi_si May 01 23:42:55 fir-md1-s1 kernel: pcspkr May 01 23:42:55 fir-md1-s1 kernel: ipmi_devintf May 01 23:42:55 fir-md1-s1 kernel: ccp May 01 23:42:55 fir-md1-s1 kernel: i2c_piix4 May 01 23:42:55 fir-md1-s1 kernel: dm_multipath May 01 23:42:55 fir-md1-s1 kernel: sg May 01 23:42:55 fir-md1-s1 kernel: k10temp May 01 23:42:55 fir-md1-s1 kernel: ipmi_msghandler May 01 23:42:55 fir-md1-s1 kernel: dm_mod May 01 23:42:55 fir-md1-s1 kernel: acpi_power_meter May 01 23:42:55 fir-md1-s1 kernel: knem(OE) May 01 23:42:55 fir-md1-s1 kernel: ip_tables May 01 23:42:55 fir-md1-s1 kernel: ext4 May 01 23:42:55 fir-md1-s1 kernel: mbcache May 01 23:42:55 fir-md1-s1 kernel: jbd2 May 01 23:42:55 fir-md1-s1 kernel: sd_mod May 01 23:42:55 fir-md1-s1 kernel: crc_t10dif May 01 23:42:55 fir-md1-s1 kernel: crct10dif_generic May 01 23:42:55 fir-md1-s1 kernel: mlx5_ib(OE) May 01 23:42:55 fir-md1-s1 kernel: ib_uverbs(OE) May 01 23:42:55 fir-md1-s1 kernel: ib_core(OE) May 01 23:42:55 fir-md1-s1 kernel: i2c_algo_bit May 01 23:42:55 fir-md1-s1 kernel: drm_kms_helper May 01 23:42:55 fir-md1-s1 kernel: mlx5_core(OE) May 01 23:42:55 fir-md1-s1 kernel: syscopyarea May 01 23:42:55 fir-md1-s1 kernel: sysfillrect May 01 23:42:55 fir-md1-s1 kernel: sysimgblt May 01 23:42:55 fir-md1-s1 kernel: fb_sys_fops May 01 23:42:55 fir-md1-s1 kernel: mlxfw(OE) May 01 23:42:55 fir-md1-s1 kernel: crct10dif_pclmul May 01 23:42:55 fir-md1-s1 kernel: ttm May 01 23:42:55 fir-md1-s1 kernel: devlink May 01 23:42:55 fir-md1-s1 kernel: ahci May 01 23:42:55 fir-md1-s1 kernel: crct10dif_common May 01 23:42:55 fir-md1-s1 kernel: libahci May 01 23:42:55 fir-md1-s1 kernel: drm May 01 23:42:55 fir-md1-s1 kernel: mlx_compat(OE) May 01 23:42:55 fir-md1-s1 kernel: tg3 May 01 23:42:55 fir-md1-s1 kernel: crc32c_intel May 01 23:42:55 fir-md1-s1 kernel: libata May 01 23:42:55 fir-md1-s1 kernel: megaraid_sas May 01 23:42:55 fir-md1-s1 kernel: drm_panel_orientation_quirks May 01 23:42:55 fir-md1-s1 kernel: ptp May 01 23:42:55 fir-md1-s1 kernel: pps_core May 01 23:42:55 fir-md1-s1 kernel: mpt3sas(OE) May 01 23:42:55 fir-md1-s1 kernel: raid_class May 01 23:42:55 fir-md1-s1 kernel: scsi_transport_sas May 01 23:42:55 fir-md1-s1 kernel: [last unloaded: libcfs] May 01 23:42:55 fir-md1-s1 kernel: May 01 23:42:55 fir-md1-s1 kernel: CPU: 21 PID: 103094 Comm: mdt_io01_085 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 May 01 23:42:55 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 May 01 23:42:55 fir-md1-s1 kernel: task: ffff982c9ff3c100 ti: ffff98596d728000 task.ti: ffff98596d728000 May 01 23:42:55 fir-md1-s1 kernel: RIP: 0010:[] May 01 23:42:55 fir-md1-s1 kernel: [] native_queued_spin_lock_slowpath+0x126/0x200 May 01 23:42:55 fir-md1-s1 kernel: RSP: 0018:ffff98596d72b8e8 EFLAGS: 00000246 May 01 23:42:55 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffff98267ca42b78 RCX: 0000000000a90000 May 01 23:42:55 fir-md1-s1 kernel: RDX: ffff984cff79b780 RSI: 0000000000d10101 RDI: ffff982c9fc8c480 May 01 23:42:55 fir-md1-s1 kernel: RBP: ffff98596d72b8e8 R08: ffff983cff75b780 R09: 0000000000000000 May 01 23:42:55 fir-md1-s1 kernel: R10: 0000000000000000 R11: ffff985c8b319038 R12: ffff983cff65ac00 May 01 23:42:55 fir-md1-s1 kernel: R13: ffff982c9ff3c168 R14: 00ffffffb7a08d80 R15: ffff984cf34000a0 May 01 23:42:55 fir-md1-s1 kernel: FS: 00007fad68bc1880(0000) GS:ffff983cff740000(0000) knlGS:0000000000000000 May 01 23:42:55 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 01 23:42:55 fir-md1-s1 kernel: CR2: 00007ffcc3425f98 CR3: 00000012f7610000 CR4: 00000000003407e0 May 01 23:42:55 fir-md1-s1 kernel: Call Trace: May 01 23:42:55 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf May 01 23:42:55 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 May 01 23:42:55 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] May 01 23:42:55 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x210/0x700 [ldiskfs] May 01 23:42:55 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 May 01 23:42:55 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] May 01 23:42:55 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] May 01 23:42:55 fir-md1-s1 kernel: [] mdt_obd_preprw+0x637/0x1060 [mdt] May 01 23:42:55 fir-md1-s1 kernel: [] tgt_brw_write+0xc7e/0x1a90 [ptlrpc] May 01 23:42:55 fir-md1-s1 kernel: [] ? lustre_msg_buf_v2+0x1b0/0x1b0 [ptlrpc] May 01 23:42:55 fir-md1-s1 kernel: [] ? lustre_msg_buf+0x17/0x60 [ptlrpc] May 01 23:42:55 fir-md1-s1 kernel: [] ? update_curr+0x14c/0x1e0 May 01 23:42:55 fir-md1-s1 kernel: [] ? account_entity_dequeue+0xae/0xd0 May 01 23:42:55 fir-md1-s1 kernel: [] ? __enqueue_entity+0x78/0x80 May 01 23:42:55 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] May 01 23:42:55 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] May 01 23:42:55 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] May 01 23:42:55 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] May 01 23:42:55 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 01 23:42:55 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] May 01 23:42:55 fir-md1-s1 kernel: [] ? default_wake_function+0x12/0x20 May 01 23:42:55 fir-md1-s1 kernel: [] ? __wake_up_common+0x5b/0x90 May 01 23:42:55 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] May 01 23:42:55 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] May 01 23:42:55 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 May 01 23:42:55 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:42:55 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 May 01 23:42:55 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:42:55 fir-md1-s1 kernel: Code: May 01 23:42:55 fir-md1-s1 kernel: 0d May 01 23:42:55 fir-md1-s1 kernel: 48 May 01 23:42:55 fir-md1-s1 kernel: 98 May 01 23:42:55 fir-md1-s1 kernel: 83 May 01 23:42:55 fir-md1-s1 kernel: e2 May 01 23:42:55 fir-md1-s1 kernel: 30 May 01 23:42:55 fir-md1-s1 kernel: 48 May 01 23:42:55 fir-md1-s1 kernel: 81 May 01 23:42:55 fir-md1-s1 kernel: c2 May 01 23:42:55 fir-md1-s1 kernel: 80 May 01 23:42:55 fir-md1-s1 kernel: b7 May 01 23:42:55 fir-md1-s1 kernel: 01 May 01 23:42:55 fir-md1-s1 kernel: 00 May 01 23:42:55 fir-md1-s1 kernel: 48 May 01 23:42:55 fir-md1-s1 kernel: 03 May 01 23:42:55 fir-md1-s1 kernel: 14 May 01 23:42:55 fir-md1-s1 kernel: c5 May 01 23:42:55 fir-md1-s1 kernel: 60 May 01 23:42:55 fir-md1-s1 kernel: b9 May 01 23:42:55 fir-md1-s1 kernel: b4 May 01 23:42:55 fir-md1-s1 kernel: b7 May 01 23:42:55 fir-md1-s1 kernel: 4c May 01 23:42:55 fir-md1-s1 kernel: 89 May 01 23:42:55 fir-md1-s1 kernel: 02 May 01 23:42:55 fir-md1-s1 kernel: 41 May 01 23:42:55 fir-md1-s1 kernel: 8b May 01 23:42:55 fir-md1-s1 kernel: 40 May 01 23:42:55 fir-md1-s1 kernel: 08 May 01 23:42:55 fir-md1-s1 kernel: 85 May 01 23:42:55 fir-md1-s1 kernel: c0 May 01 23:42:55 fir-md1-s1 kernel: 75 May 01 23:42:55 fir-md1-s1 kernel: 0f May 01 23:42:55 fir-md1-s1 kernel: 0f May 01 23:42:55 fir-md1-s1 kernel: 1f May 01 23:42:55 fir-md1-s1 kernel: 44 May 01 23:42:55 fir-md1-s1 kernel: 00 May 01 23:42:55 fir-md1-s1 kernel: 00 May 01 23:42:55 fir-md1-s1 kernel: f3 May 01 23:42:55 fir-md1-s1 kernel: 90 May 01 23:42:55 fir-md1-s1 kernel: 41 May 01 23:42:55 fir-md1-s1 kernel: 8b May 01 23:42:55 fir-md1-s1 kernel: 40 May 01 23:42:55 fir-md1-s1 kernel: 08 May 01 23:42:55 fir-md1-s1 kernel: <85> May 01 23:42:55 fir-md1-s1 kernel: c0 May 01 23:42:55 fir-md1-s1 kernel: 74 May 01 23:42:55 fir-md1-s1 kernel: f6 May 01 23:42:55 fir-md1-s1 kernel: 4d May 01 23:42:55 fir-md1-s1 kernel: 8b May 01 23:42:55 fir-md1-s1 kernel: 08 May 01 23:42:55 fir-md1-s1 kernel: 4d May 01 23:42:55 fir-md1-s1 kernel: 85 May 01 23:42:55 fir-md1-s1 kernel: c9 May 01 23:42:55 fir-md1-s1 kernel: 74 May 01 23:42:55 fir-md1-s1 kernel: 04 May 01 23:42:55 fir-md1-s1 kernel: 41 May 01 23:42:55 fir-md1-s1 kernel: 0f May 01 23:42:55 fir-md1-s1 kernel: 18 May 01 23:42:55 fir-md1-s1 kernel: 09 May 01 23:42:55 fir-md1-s1 kernel: 8b May 01 23:42:55 fir-md1-s1 kernel: 17 May 01 23:42:55 fir-md1-s1 kernel: 0f May 01 23:42:55 fir-md1-s1 kernel: b7 May 01 23:42:55 fir-md1-s1 kernel: c2 May 01 23:42:55 fir-md1-s1 kernel: May 01 23:42:55 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#32 stuck for 23s! [mdt_io00_072:103262] May 01 23:42:55 fir-md1-s1 kernel: Modules linked in: May 01 23:42:55 fir-md1-s1 kernel: osp(OE) May 01 23:42:55 fir-md1-s1 kernel: mdd(OE) May 01 23:42:55 fir-md1-s1 kernel: lod(OE) May 01 23:42:55 fir-md1-s1 kernel: mdt(OE) May 01 23:42:55 fir-md1-s1 kernel: lfsck(OE) May 01 23:42:55 fir-md1-s1 kernel: mgs(OE) May 01 23:42:55 fir-md1-s1 kernel: mgc(OE) May 01 23:42:55 fir-md1-s1 kernel: osd_ldiskfs(OE) May 01 23:42:55 fir-md1-s1 kernel: lquota(OE) May 01 23:42:55 fir-md1-s1 kernel: ldiskfs(OE) May 01 23:42:55 fir-md1-s1 kernel: lustre(OE) May 01 23:42:55 fir-md1-s1 kernel: lmv(OE) May 01 23:42:55 fir-md1-s1 kernel: mdc(OE) May 01 23:42:55 fir-md1-s1 kernel: osc(OE) May 01 23:42:55 fir-md1-s1 kernel: lov(OE) May 01 23:42:55 fir-md1-s1 kernel: fid(OE) May 01 23:42:55 fir-md1-s1 kernel: fld(OE) May 01 23:42:55 fir-md1-s1 kernel: ko2iblnd(OE) May 01 23:42:55 fir-md1-s1 kernel: ptlrpc(OE) May 01 23:42:55 fir-md1-s1 kernel: obdclass(OE) May 01 23:42:55 fir-md1-s1 kernel: lnet(OE) May 01 23:42:55 fir-md1-s1 kernel: libcfs(OE) May 01 23:42:55 fir-md1-s1 kernel: rpcsec_gss_krb5 May 01 23:42:55 fir-md1-s1 kernel: auth_rpcgss May 01 23:42:55 fir-md1-s1 kernel: nfsv4 May 01 23:42:55 fir-md1-s1 kernel: dns_resolver May 01 23:42:55 fir-md1-s1 kernel: nfs May 01 23:42:55 fir-md1-s1 kernel: lockd May 01 23:42:55 fir-md1-s1 kernel: grace May 01 23:42:55 fir-md1-s1 kernel: fscache May 01 23:42:55 fir-md1-s1 kernel: rdma_ucm(OE) May 01 23:42:55 fir-md1-s1 kernel: ib_ucm(OE) May 01 23:42:55 fir-md1-s1 kernel: rdma_cm(OE) May 01 23:42:55 fir-md1-s1 kernel: iw_cm(OE) May 01 23:42:55 fir-md1-s1 kernel: ib_ipoib(OE) May 01 23:42:55 fir-md1-s1 kernel: ib_cm(OE) May 01 23:42:55 fir-md1-s1 kernel: ib_umad(OE) May 01 23:42:55 fir-md1-s1 kernel: mlx5_fpga_tools(OE) May 01 23:42:55 fir-md1-s1 kernel: mlx4_en(OE) May 01 23:42:55 fir-md1-s1 kernel: mlx4_ib(OE) May 01 23:42:55 fir-md1-s1 kernel: mlx4_core(OE) May 01 23:42:55 fir-md1-s1 kernel: dell_rbu May 01 23:42:55 fir-md1-s1 kernel: sunrpc May 01 23:42:55 fir-md1-s1 kernel: vfat May 01 23:42:55 fir-md1-s1 kernel: fat May 01 23:42:55 fir-md1-s1 kernel: dm_round_robin May 01 23:42:55 fir-md1-s1 kernel: amd64_edac_mod May 01 23:42:55 fir-md1-s1 kernel: edac_mce_amd May 01 23:42:55 fir-md1-s1 kernel: kvm_amd May 01 23:42:55 fir-md1-s1 kernel: kvm May 01 23:42:55 fir-md1-s1 kernel: ses May 01 23:42:55 fir-md1-s1 kernel: irqbypass May 01 23:42:55 fir-md1-s1 kernel: crc32_pclmul May 01 23:42:55 fir-md1-s1 kernel: enclosure May 01 23:42:55 fir-md1-s1 kernel: ghash_clmulni_intel May 01 23:42:55 fir-md1-s1 kernel: dcdbas May 01 23:42:55 fir-md1-s1 kernel: aesni_intel May 01 23:42:55 fir-md1-s1 kernel: lrw May 01 23:42:55 fir-md1-s1 kernel: gf128mul May 01 23:42:55 fir-md1-s1 kernel: glue_helper May 01 23:42:55 fir-md1-s1 kernel: ablk_helper May 01 23:42:55 fir-md1-s1 kernel: cryptd May 01 23:42:55 fir-md1-s1 kernel: ipmi_si May 01 23:42:55 fir-md1-s1 kernel: pcspkr May 01 23:42:55 fir-md1-s1 kernel: ipmi_devintf May 01 23:42:55 fir-md1-s1 kernel: ccp May 01 23:42:55 fir-md1-s1 kernel: i2c_piix4 May 01 23:42:55 fir-md1-s1 kernel: dm_multipath May 01 23:42:55 fir-md1-s1 kernel: sg May 01 23:42:55 fir-md1-s1 kernel: k10temp May 01 23:42:55 fir-md1-s1 kernel: ipmi_msghandler May 01 23:42:55 fir-md1-s1 kernel: dm_mod May 01 23:42:55 fir-md1-s1 kernel: acpi_power_meter May 01 23:42:55 fir-md1-s1 kernel: knem(OE) May 01 23:42:55 fir-md1-s1 kernel: ip_tables May 01 23:42:55 fir-md1-s1 kernel: ext4 May 01 23:42:55 fir-md1-s1 kernel: mbcache May 01 23:42:55 fir-md1-s1 kernel: jbd2 May 01 23:42:55 fir-md1-s1 kernel: sd_mod May 01 23:42:55 fir-md1-s1 kernel: crc_t10dif May 01 23:42:55 fir-md1-s1 kernel: crct10dif_generic May 01 23:42:55 fir-md1-s1 kernel: mlx5_ib(OE) May 01 23:42:55 fir-md1-s1 kernel: ib_uverbs(OE) May 01 23:42:55 fir-md1-s1 kernel: ib_core(OE) May 01 23:42:55 fir-md1-s1 kernel: i2c_algo_bit May 01 23:42:55 fir-md1-s1 kernel: drm_kms_helper May 01 23:42:55 fir-md1-s1 kernel: mlx5_core(OE) May 01 23:42:55 fir-md1-s1 kernel: syscopyarea May 01 23:42:55 fir-md1-s1 kernel: sysfillrect May 01 23:42:55 fir-md1-s1 kernel: sysimgblt May 01 23:42:55 fir-md1-s1 kernel: fb_sys_fops May 01 23:42:55 fir-md1-s1 kernel: mlxfw(OE) May 01 23:42:55 fir-md1-s1 kernel: crct10dif_pclmul May 01 23:42:55 fir-md1-s1 kernel: ttm May 01 23:42:55 fir-md1-s1 kernel: devlink May 01 23:42:55 fir-md1-s1 kernel: ahci May 01 23:42:55 fir-md1-s1 kernel: crct10dif_common May 01 23:42:55 fir-md1-s1 kernel: libahci May 01 23:42:55 fir-md1-s1 kernel: drm May 01 23:42:55 fir-md1-s1 kernel: mlx_compat(OE) May 01 23:42:55 fir-md1-s1 kernel: tg3 May 01 23:42:55 fir-md1-s1 kernel: crc32c_intel May 01 23:42:55 fir-md1-s1 kernel: libata May 01 23:42:55 fir-md1-s1 kernel: megaraid_sas May 01 23:42:55 fir-md1-s1 kernel: drm_panel_orientation_quirks May 01 23:42:55 fir-md1-s1 kernel: ptp May 01 23:42:55 fir-md1-s1 kernel: pps_core May 01 23:42:55 fir-md1-s1 kernel: mpt3sas(OE) May 01 23:42:55 fir-md1-s1 kernel: raid_class May 01 23:42:55 fir-md1-s1 kernel: scsi_transport_sas May 01 23:42:55 fir-md1-s1 kernel: [last unloaded: libcfs] May 01 23:42:55 fir-md1-s1 kernel: May 01 23:42:55 fir-md1-s1 kernel: CPU: 32 PID: 103262 Comm: mdt_io00_072 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 May 01 23:42:55 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 May 01 23:42:55 fir-md1-s1 kernel: task: ffff984cba239040 ti: ffff982bc9ed4000 task.ti: ffff982bc9ed4000 May 01 23:42:55 fir-md1-s1 kernel: RIP: 0010:[] May 01 23:42:55 fir-md1-s1 kernel: [] native_queued_spin_lock_slowpath+0x122/0x200 May 01 23:42:55 fir-md1-s1 kernel: RSP: 0018:ffff982bc9ed78e8 EFLAGS: 00000246 May 01 23:42:55 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffff9837d662bb78 RCX: 0000000001010000 May 01 23:42:55 fir-md1-s1 kernel: RDX: ffff983cff75b780 RSI: 0000000000a90101 RDI: ffff982c9fc8c480 May 01 23:42:55 fir-md1-s1 kernel: RBP: ffff982bc9ed78e8 R08: ffff982cff01b780 R09: 0000000000000000 May 01 23:42:55 fir-md1-s1 kernel: R10: 0000000000000000 R11: ffff985c8b319038 R12: ffff982cfef9ac00 May 01 23:42:55 fir-md1-s1 kernel: R13: ffff984cba2390a8 R14: 00ffffffb7a08d80 R15: ffff984cf34000a0 May 01 23:42:55 fir-md1-s1 kernel: FS: 00007f1fd5eaa700(0000) GS:ffff982cff000000(0000) knlGS:0000000000000000 May 01 23:42:55 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 01 23:42:55 fir-md1-s1 kernel: CR2: 00007f41ebfcd1b0 CR3: 0000001038b88000 CR4: 00000000003407e0 May 01 23:42:55 fir-md1-s1 kernel: Call Trace: May 01 23:42:55 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf May 01 23:42:55 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 May 01 23:42:55 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] May 01 23:42:55 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x210/0x700 [ldiskfs] May 01 23:42:55 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 May 01 23:42:55 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] May 01 23:42:55 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] May 01 23:42:55 fir-md1-s1 kernel: [] mdt_obd_preprw+0x637/0x1060 [mdt] May 01 23:42:55 fir-md1-s1 kernel: [] tgt_brw_write+0xc7e/0x1a90 [ptlrpc] May 01 23:42:55 fir-md1-s1 kernel: [] ? load_balance+0x178/0x9a0 May 01 23:42:55 fir-md1-s1 kernel: [] ? update_curr+0x14c/0x1e0 May 01 23:42:55 fir-md1-s1 kernel: [] ? account_entity_dequeue+0xae/0xd0 May 01 23:42:55 fir-md1-s1 kernel: [] ? __enqueue_entity+0x78/0x80 May 01 23:42:55 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] May 01 23:42:55 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] May 01 23:42:55 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] May 01 23:42:55 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] May 01 23:42:55 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 01 23:42:55 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] May 01 23:42:55 fir-md1-s1 kernel: [] ? default_wake_function+0x12/0x20 May 01 23:42:55 fir-md1-s1 kernel: [] ? __wake_up_common+0x5b/0x90 May 01 23:42:55 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] May 01 23:42:55 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] May 01 23:42:55 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 May 01 23:42:55 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:42:55 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 May 01 23:42:55 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:42:55 fir-md1-s1 kernel: Code: May 01 23:42:55 fir-md1-s1 kernel: 13 May 01 23:42:55 fir-md1-s1 kernel: 48 May 01 23:42:55 fir-md1-s1 kernel: c1 May 01 23:42:55 fir-md1-s1 kernel: ea May 01 23:42:55 fir-md1-s1 kernel: 0d May 01 23:42:55 fir-md1-s1 kernel: 48 May 01 23:42:55 fir-md1-s1 kernel: 98 May 01 23:42:55 fir-md1-s1 kernel: 83 May 01 23:42:55 fir-md1-s1 kernel: e2 May 01 23:42:55 fir-md1-s1 kernel: 30 May 01 23:42:55 fir-md1-s1 kernel: 48 May 01 23:42:55 fir-md1-s1 kernel: 81 May 01 23:42:55 fir-md1-s1 kernel: c2 May 01 23:42:55 fir-md1-s1 kernel: 80 May 01 23:42:55 fir-md1-s1 kernel: b7 May 01 23:42:55 fir-md1-s1 kernel: 01 May 01 23:42:55 fir-md1-s1 kernel: 00 May 01 23:42:55 fir-md1-s1 kernel: 48 May 01 23:42:55 fir-md1-s1 kernel: 03 May 01 23:42:55 fir-md1-s1 kernel: 14 May 01 23:42:55 fir-md1-s1 kernel: c5 May 01 23:42:55 fir-md1-s1 kernel: 60 May 01 23:42:55 fir-md1-s1 kernel: b9 May 01 23:42:55 fir-md1-s1 kernel: b4 May 01 23:42:55 fir-md1-s1 kernel: b7 May 01 23:42:55 fir-md1-s1 kernel: 4c May 01 23:42:55 fir-md1-s1 kernel: 89 May 01 23:42:55 fir-md1-s1 kernel: 02 May 01 23:42:55 fir-md1-s1 kernel: 41 May 01 23:42:55 fir-md1-s1 kernel: 8b May 01 23:42:55 fir-md1-s1 kernel: 40 May 01 23:42:55 fir-md1-s1 kernel: 08 May 01 23:42:55 fir-md1-s1 kernel: 85 May 01 23:42:55 fir-md1-s1 kernel: c0 May 01 23:42:55 fir-md1-s1 kernel: 75 May 01 23:42:55 fir-md1-s1 kernel: 0f May 01 23:42:55 fir-md1-s1 kernel: 0f May 01 23:42:55 fir-md1-s1 kernel: 1f May 01 23:42:55 fir-md1-s1 kernel: 44 May 01 23:42:55 fir-md1-s1 kernel: 00 May 01 23:42:55 fir-md1-s1 kernel: 00 May 01 23:42:55 fir-md1-s1 kernel: f3 May 01 23:42:55 fir-md1-s1 kernel: 90 May 01 23:42:55 fir-md1-s1 kernel: <41> May 01 23:42:55 fir-md1-s1 kernel: 8b May 01 23:42:55 fir-md1-s1 kernel: 40 May 01 23:42:55 fir-md1-s1 kernel: 08 May 01 23:42:55 fir-md1-s1 kernel: 85 May 01 23:42:55 fir-md1-s1 kernel: c0 May 01 23:42:55 fir-md1-s1 kernel: 74 May 01 23:42:55 fir-md1-s1 kernel: f6 May 01 23:42:55 fir-md1-s1 kernel: 4d May 01 23:42:55 fir-md1-s1 kernel: 8b May 01 23:42:55 fir-md1-s1 kernel: 08 May 01 23:42:55 fir-md1-s1 kernel: 4d May 01 23:42:55 fir-md1-s1 kernel: 85 May 01 23:42:55 fir-md1-s1 kernel: c9 May 01 23:42:55 fir-md1-s1 kernel: 74 May 01 23:42:55 fir-md1-s1 kernel: 04 May 01 23:42:55 fir-md1-s1 kernel: 41 May 01 23:42:55 fir-md1-s1 kernel: 0f May 01 23:42:55 fir-md1-s1 kernel: 18 May 01 23:42:55 fir-md1-s1 kernel: 09 May 01 23:42:55 fir-md1-s1 kernel: 8b May 01 23:42:55 fir-md1-s1 kernel: May 01 23:42:55 fir-md1-s1 kernel: Lustre: 103102:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff9826a0209450 x1631546314486880/t0(0) o4->1f7bbeda-f291-d2ba-e680-a24cad2ce97f@10.9.104.23@o2ib4:29/0 lens 944/448 e 0 to 0 dl 1556779379 ref 2 fl Interpret:/2/0 rc 0/0 May 01 23:42:55 fir-md1-s1 kernel: Lustre: 103102:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 139 previous similar messages May 01 23:42:55 fir-md1-s1 kernel: [last unloaded: libcfs] May 01 23:42:55 fir-md1-s1 kernel: May 01 23:42:55 fir-md1-s1 kernel: CPU: 2 PID: 101733 Comm: mdt_io02_001 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 May 01 23:42:55 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 May 01 23:42:55 fir-md1-s1 kernel: task: ffff982cf0b630c0 ti: ffff984cfa370000 task.ti: ffff984cfa370000 May 01 23:42:55 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 May 01 23:42:55 fir-md1-s1 kernel: RSP: 0018:ffff984cfa3738e8 EFLAGS: 00000246 May 01 23:42:55 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffff985c7a711b78 RCX: 0000000000110000 May 01 23:42:55 fir-md1-s1 kernel: RDX: ffff985d3f4db780 RSI: 0000000000790101 RDI: ffff982c9fc8c480 May 01 23:42:55 fir-md1-s1 kernel: RBP: ffff984cfa3738e8 R08: ffff984cff61b780 R09: 0000000000000000 May 01 23:42:55 fir-md1-s1 kernel: R10: 0000000000000000 R11: ffff985c8b319038 R12: ffff984cff61ac00 May 01 23:42:55 fir-md1-s1 kernel: R13: ffff982cf0b63128 R14: 00ffffffb7a08d80 R15: ffff984cf34000a0 May 01 23:42:55 fir-md1-s1 kernel: FS: 00007f759e3eb740(0000) GS:ffff984cff600000(0000) knlGS:0000000000000000 May 01 23:42:55 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 01 23:42:55 fir-md1-s1 kernel: CR2: 00007f759e3fa000 CR3: 00000012f7610000 CR4: 00000000003407e0 May 01 23:42:55 fir-md1-s1 kernel: Call Trace: May 01 23:42:55 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf May 01 23:42:55 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 May 01 23:42:55 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] May 01 23:42:55 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x210/0x700 [ldiskfs] May 01 23:42:55 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 May 01 23:42:55 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] May 01 23:42:55 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] May 01 23:42:55 fir-md1-s1 kernel: [] mdt_obd_preprw+0x637/0x1060 [mdt] May 01 23:42:55 fir-md1-s1 kernel: [] tgt_brw_write+0xc7e/0x1a90 [ptlrpc] May 01 23:42:55 fir-md1-s1 kernel: [] ? lustre_msg_buf_v2+0x1b0/0x1b0 [ptlrpc] May 01 23:42:55 fir-md1-s1 kernel: [] ? lustre_msg_buf+0x17/0x60 [ptlrpc] May 01 23:42:55 fir-md1-s1 kernel: [] ? update_curr+0x14c/0x1e0 May 01 23:42:55 fir-md1-s1 kernel: [] ? account_entity_dequeue+0xae/0xd0 May 01 23:42:55 fir-md1-s1 kernel: [] ? __enqueue_entity+0x78/0x80 May 01 23:42:55 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] May 01 23:42:55 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] May 01 23:42:55 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] May 01 23:42:55 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] May 01 23:42:55 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 01 23:42:55 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] May 01 23:42:55 fir-md1-s1 kernel: [] ? default_wake_function+0x12/0x20 May 01 23:42:55 fir-md1-s1 kernel: [] ? __wake_up_common+0x5b/0x90 May 01 23:42:55 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] May 01 23:42:55 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] May 01 23:42:55 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 May 01 23:42:55 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:42:55 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 May 01 23:42:55 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:42:55 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 60 b9 b4 b7 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b May 01 23:42:56 fir-md1-s1 kernel: Lustre: 101352:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556779354/real 1556779354] req@ffff98239efdcb00 x1632254604122352/t0(0) o601->fir-MDT0000-lwp-MDT0002@0@lo:23/10 lens 336/336 e 1 to 1 dl 1556779375 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 May 01 23:42:56 fir-md1-s1 kernel: Lustre: 101352:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 01 23:42:56 fir-md1-s1 kernel: Lustre: fir-MDT0000-lwp-MDT0002: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete May 01 23:42:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Received new LWP connection from 0@lo, removing former export from same NID May 01 23:42:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client d2ab40ab-8888-3abb-75f9-9c32b2196967 (at 10.8.26.26@o2ib6) reconnecting May 01 23:42:59 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages May 01 23:42:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.8.26.26@o2ib6) May 01 23:42:59 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages May 01 23:43:04 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#4 stuck for 23s! [mdt_io00_057:103101] May 01 23:43:04 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#9 stuck for 23s! [mdt_io01_029:102923] May 01 23:43:04 fir-md1-s1 kernel: Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) May 01 23:43:04 fir-md1-s1 kernel: Modules linked in: May 01 23:43:04 fir-md1-s1 kernel: osp(OE) May 01 23:43:04 fir-md1-s1 kernel: mdd(OE) May 01 23:43:04 fir-md1-s1 kernel: lod(OE) May 01 23:43:04 fir-md1-s1 kernel: mdt(OE) May 01 23:43:04 fir-md1-s1 kernel: lfsck(OE) May 01 23:43:04 fir-md1-s1 kernel: mgs(OE) May 01 23:43:04 fir-md1-s1 kernel: mgc(OE) May 01 23:43:04 fir-md1-s1 kernel: osd_ldiskfs(OE) May 01 23:43:04 fir-md1-s1 kernel: lquota(OE) May 01 23:43:04 fir-md1-s1 kernel: ldiskfs(OE) May 01 23:43:04 fir-md1-s1 kernel: lustre(OE) May 01 23:43:04 fir-md1-s1 kernel: lmv(OE) May 01 23:43:04 fir-md1-s1 kernel: mdc(OE) May 01 23:43:04 fir-md1-s1 kernel: osc(OE) May 01 23:43:04 fir-md1-s1 kernel: lov(OE) May 01 23:43:04 fir-md1-s1 kernel: fid(OE) May 01 23:43:04 fir-md1-s1 kernel: fld(OE) May 01 23:43:04 fir-md1-s1 kernel: ko2iblnd(OE) May 01 23:43:04 fir-md1-s1 kernel: ptlrpc(OE) May 01 23:43:04 fir-md1-s1 kernel: obdclass(OE) May 01 23:43:04 fir-md1-s1 kernel: lnet(OE) May 01 23:43:04 fir-md1-s1 kernel: libcfs(OE) May 01 23:43:04 fir-md1-s1 kernel: rpcsec_gss_krb5 May 01 23:43:04 fir-md1-s1 kernel: auth_rpcgss May 01 23:43:04 fir-md1-s1 kernel: nfsv4 May 01 23:43:04 fir-md1-s1 kernel: dns_resolver May 01 23:43:04 fir-md1-s1 kernel: nfs May 01 23:43:04 fir-md1-s1 kernel: lockd May 01 23:43:04 fir-md1-s1 kernel: grace May 01 23:43:04 fir-md1-s1 kernel: fscache May 01 23:43:04 fir-md1-s1 kernel: rdma_ucm(OE) May 01 23:43:04 fir-md1-s1 kernel: ib_ucm(OE) May 01 23:43:04 fir-md1-s1 kernel: rdma_cm(OE) May 01 23:43:04 fir-md1-s1 kernel: iw_cm(OE) May 01 23:43:04 fir-md1-s1 kernel: ib_ipoib(OE) May 01 23:43:04 fir-md1-s1 kernel: ib_cm(OE) May 01 23:43:04 fir-md1-s1 kernel: ib_umad(OE) May 01 23:43:04 fir-md1-s1 kernel: mlx5_fpga_tools(OE) May 01 23:43:04 fir-md1-s1 kernel: mlx4_en(OE) May 01 23:43:04 fir-md1-s1 kernel: mlx4_ib(OE) May 01 23:43:04 fir-md1-s1 kernel: mlx4_core(OE) May 01 23:43:04 fir-md1-s1 kernel: dell_rbu May 01 23:43:04 fir-md1-s1 kernel: sunrpc May 01 23:43:04 fir-md1-s1 kernel: vfat May 01 23:43:04 fir-md1-s1 kernel: fat May 01 23:43:04 fir-md1-s1 kernel: dm_round_robin May 01 23:43:04 fir-md1-s1 kernel: amd64_edac_mod May 01 23:43:04 fir-md1-s1 kernel: edac_mce_amd May 01 23:43:04 fir-md1-s1 kernel: kvm_amd May 01 23:43:04 fir-md1-s1 kernel: kvm May 01 23:43:04 fir-md1-s1 kernel: ses May 01 23:43:04 fir-md1-s1 kernel: irqbypass May 01 23:43:04 fir-md1-s1 kernel: crc32_pclmul May 01 23:43:04 fir-md1-s1 kernel: enclosure May 01 23:43:04 fir-md1-s1 kernel: ghash_clmulni_intel May 01 23:43:04 fir-md1-s1 kernel: dcdbas May 01 23:43:04 fir-md1-s1 kernel: aesni_intel May 01 23:43:04 fir-md1-s1 kernel: lrw May 01 23:43:04 fir-md1-s1 kernel: gf128mul May 01 23:43:04 fir-md1-s1 kernel: glue_helper May 01 23:43:04 fir-md1-s1 kernel: ablk_helper May 01 23:43:04 fir-md1-s1 kernel: cryptd May 01 23:43:04 fir-md1-s1 kernel: ipmi_si May 01 23:43:04 fir-md1-s1 kernel: pcspkr May 01 23:43:04 fir-md1-s1 kernel: ipmi_devintf May 01 23:43:04 fir-md1-s1 kernel: ccp May 01 23:43:04 fir-md1-s1 kernel: i2c_piix4 May 01 23:43:04 fir-md1-s1 kernel: dm_multipath May 01 23:43:04 fir-md1-s1 kernel: sg May 01 23:43:04 fir-md1-s1 kernel: k10temp May 01 23:43:04 fir-md1-s1 kernel: ipmi_msghandler May 01 23:43:04 fir-md1-s1 kernel: dm_mod May 01 23:43:04 fir-md1-s1 kernel: acpi_power_meter May 01 23:43:04 fir-md1-s1 kernel: knem(OE) May 01 23:43:04 fir-md1-s1 kernel: ip_tables May 01 23:43:04 fir-md1-s1 kernel: ext4 May 01 23:43:04 fir-md1-s1 kernel: mbcache May 01 23:43:04 fir-md1-s1 kernel: jbd2 May 01 23:43:04 fir-md1-s1 kernel: sd_mod May 01 23:43:04 fir-md1-s1 kernel: crc_t10dif May 01 23:43:04 fir-md1-s1 kernel: crct10dif_generic May 01 23:43:04 fir-md1-s1 kernel: mlx5_ib(OE) May 01 23:43:04 fir-md1-s1 kernel: ib_uverbs(OE) May 01 23:43:04 fir-md1-s1 kernel: ib_core(OE) May 01 23:43:04 fir-md1-s1 kernel: i2c_algo_bit May 01 23:43:04 fir-md1-s1 kernel: drm_kms_helper May 01 23:43:04 fir-md1-s1 kernel: mlx5_core(OE) May 01 23:43:04 fir-md1-s1 kernel: syscopyarea May 01 23:43:04 fir-md1-s1 kernel: sysfillrect May 01 23:43:04 fir-md1-s1 kernel: sysimgblt May 01 23:43:04 fir-md1-s1 kernel: fb_sys_fops May 01 23:43:04 fir-md1-s1 kernel: mlxfw(OE) May 01 23:43:04 fir-md1-s1 kernel: crct10dif_pclmul May 01 23:43:04 fir-md1-s1 kernel: ttm May 01 23:43:04 fir-md1-s1 kernel: devlink May 01 23:43:04 fir-md1-s1 kernel: ahci May 01 23:43:04 fir-md1-s1 kernel: crct10dif_common May 01 23:43:04 fir-md1-s1 kernel: libahci May 01 23:43:04 fir-md1-s1 kernel: drm May 01 23:43:04 fir-md1-s1 kernel: mlx_compat(OE) May 01 23:43:04 fir-md1-s1 kernel: tg3 May 01 23:43:04 fir-md1-s1 kernel: crc32c_intel May 01 23:43:04 fir-md1-s1 kernel: libata May 01 23:43:04 fir-md1-s1 kernel: megaraid_sas May 01 23:43:04 fir-md1-s1 kernel: drm_panel_orientation_quirks May 01 23:43:04 fir-md1-s1 kernel: ptp May 01 23:43:04 fir-md1-s1 kernel: pps_core May 01 23:43:04 fir-md1-s1 kernel: mpt3sas(OE) May 01 23:43:04 fir-md1-s1 kernel: raid_class May 01 23:43:04 fir-md1-s1 kernel: scsi_transport_sas May 01 23:43:04 fir-md1-s1 kernel: [last unloaded: libcfs] May 01 23:43:04 fir-md1-s1 kernel: May 01 23:43:04 fir-md1-s1 kernel: CPU: 9 PID: 102923 Comm: mdt_io01_029 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 May 01 23:43:04 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 May 01 23:43:04 fir-md1-s1 kernel: task: ffff985c7cfbd140 ti: ffff985cbe4d8000 task.ti: ffff985cbe4d8000 May 01 23:43:04 fir-md1-s1 kernel: RIP: 0010:[] May 01 23:43:04 fir-md1-s1 kernel: [] native_queued_spin_lock_slowpath+0x15e/0x200 May 01 23:43:04 fir-md1-s1 kernel: RSP: 0018:ffff985cbe4db800 EFLAGS: 00000212 May 01 23:43:04 fir-md1-s1 kernel: RAX: 0000000000000101 RBX: ffff983165105ac0 RCX: 0000000000490000 May 01 23:43:04 fir-md1-s1 kernel: RDX: 0000000000190101 RSI: 0000000000000101 RDI: ffff982c9fc8c480 May 01 23:43:04 fir-md1-s1 kernel: RBP: ffff985cbe4db800 R08: ffff983cff69b780 R09: 0000000000000000 May 01 23:43:04 fir-md1-s1 kernel: R10: ffff983cff69f140 R11: ffffde3f18cce000 R12: 0000000000000000 May 01 23:43:04 fir-md1-s1 kernel: R13: ffff985cbe4db7a0 R14: ffff983165105830 R15: 0000000000000000 May 01 23:43:04 fir-md1-s1 kernel: FS: 00007fddcbed4880(0000) GS:ffff983cff680000(0000) knlGS:0000000000000000 May 01 23:43:04 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 01 23:43:04 fir-md1-s1 kernel: CR2: 00007f50e83bb000 CR3: 00000012f7610000 CR4: 00000000003407e0 May 01 23:43:04 fir-md1-s1 kernel: Call Trace: May 01 23:43:04 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf May 01 23:43:04 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 May 01 23:43:04 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] May 01 23:43:04 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] May 01 23:43:04 fir-md1-s1 kernel: [] ? ktime_get+0x52/0xe0 May 01 23:43:04 fir-md1-s1 kernel: [] ? kiblnd_check_sends_locked+0xa72/0xe40 [ko2iblnd] May 01 23:43:04 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] May 01 23:43:04 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 May 01 23:43:04 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] May 01 23:43:04 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] May 01 23:43:04 fir-md1-s1 kernel: [] mdt_obd_preprw+0x637/0x1060 [mdt] May 01 23:43:04 fir-md1-s1 kernel: [] tgt_brw_write+0xc7e/0x1a90 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? lustre_msg_buf_v2+0x1b0/0x1b0 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? lustre_msg_buf+0x17/0x60 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? update_curr+0x14c/0x1e0 May 01 23:43:04 fir-md1-s1 kernel: [] ? account_entity_dequeue+0xae/0xd0 May 01 23:43:04 fir-md1-s1 kernel: [] ? __enqueue_entity+0x78/0x80 May 01 23:43:04 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] May 01 23:43:04 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? default_wake_function+0x12/0x20 May 01 23:43:04 fir-md1-s1 kernel: [] ? __wake_up_common+0x5b/0x90 May 01 23:43:04 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 May 01 23:43:04 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:43:04 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 May 01 23:43:04 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:43:04 fir-md1-s1 kernel: Code: May 01 23:43:04 fir-md1-s1 kernel: 0f May 01 23:43:04 fir-md1-s1 kernel: 18 May 01 23:43:04 fir-md1-s1 kernel: 09 May 01 23:43:04 fir-md1-s1 kernel: 8b May 01 23:43:04 fir-md1-s1 kernel: 17 May 01 23:43:04 fir-md1-s1 kernel: 0f May 01 23:43:04 fir-md1-s1 kernel: b7 May 01 23:43:04 fir-md1-s1 kernel: c2 May 01 23:43:04 fir-md1-s1 kernel: 85 May 01 23:43:04 fir-md1-s1 kernel: c0 May 01 23:43:04 fir-md1-s1 kernel: 74 May 01 23:43:04 fir-md1-s1 kernel: 21 May 01 23:43:04 fir-md1-s1 kernel: 83 May 01 23:43:04 fir-md1-s1 kernel: f8 May 01 23:43:04 fir-md1-s1 kernel: 03 May 01 23:43:04 fir-md1-s1 kernel: 75 May 01 23:43:04 fir-md1-s1 kernel: 10 May 01 23:43:04 fir-md1-s1 kernel: eb May 01 23:43:04 fir-md1-s1 kernel: 1a May 01 23:43:04 fir-md1-s1 kernel: 66 May 01 23:43:04 fir-md1-s1 kernel: 2e May 01 23:43:04 fir-md1-s1 kernel: 0f May 01 23:43:04 fir-md1-s1 kernel: 1f May 01 23:43:04 fir-md1-s1 kernel: 84 May 01 23:43:04 fir-md1-s1 kernel: 00 May 01 23:43:04 fir-md1-s1 kernel: 00 May 01 23:43:04 fir-md1-s1 kernel: 00 May 01 23:43:04 fir-md1-s1 kernel: 00 May 01 23:43:04 fir-md1-s1 kernel: 00 May 01 23:43:04 fir-md1-s1 kernel: 85 May 01 23:43:04 fir-md1-s1 kernel: c0 May 01 23:43:04 fir-md1-s1 kernel: 74 May 01 23:43:04 fir-md1-s1 kernel: 0c May 01 23:43:04 fir-md1-s1 kernel: f3 May 01 23:43:04 fir-md1-s1 kernel: 90 May 01 23:43:04 fir-md1-s1 kernel: 8b May 01 23:43:04 fir-md1-s1 kernel: 17 May 01 23:43:04 fir-md1-s1 kernel: 0f May 01 23:43:04 fir-md1-s1 kernel: b7 May 01 23:43:04 fir-md1-s1 kernel: c2 May 01 23:43:04 fir-md1-s1 kernel: 83 May 01 23:43:04 fir-md1-s1 kernel: f8 May 01 23:43:04 fir-md1-s1 kernel: 03 May 01 23:43:04 fir-md1-s1 kernel: <75> May 01 23:43:04 fir-md1-s1 kernel: f0 May 01 23:43:04 fir-md1-s1 kernel: be May 01 23:43:04 fir-md1-s1 kernel: 01 May 01 23:43:04 fir-md1-s1 kernel: 00 May 01 23:43:04 fir-md1-s1 kernel: 00 May 01 23:43:04 fir-md1-s1 kernel: 00 May 01 23:43:04 fir-md1-s1 kernel: eb May 01 23:43:04 fir-md1-s1 kernel: 15 May 01 23:43:04 fir-md1-s1 kernel: 66 May 01 23:43:04 fir-md1-s1 kernel: 0f May 01 23:43:04 fir-md1-s1 kernel: 1f May 01 23:43:04 fir-md1-s1 kernel: 84 May 01 23:43:04 fir-md1-s1 kernel: 00 May 01 23:43:04 fir-md1-s1 kernel: 00 May 01 23:43:04 fir-md1-s1 kernel: 00 May 01 23:43:04 fir-md1-s1 kernel: 00 May 01 23:43:04 fir-md1-s1 kernel: 00 May 01 23:43:04 fir-md1-s1 kernel: 89 May 01 23:43:04 fir-md1-s1 kernel: d0 May 01 23:43:04 fir-md1-s1 kernel: f0 May 01 23:43:04 fir-md1-s1 kernel: May 01 23:43:04 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [mdt_io00_073:103263] May 01 23:43:04 fir-md1-s1 kernel: Modules linked in: May 01 23:43:04 fir-md1-s1 kernel: osp(OE) May 01 23:43:04 fir-md1-s1 kernel: mdd(OE) May 01 23:43:04 fir-md1-s1 kernel: lod(OE) May 01 23:43:04 fir-md1-s1 kernel: mdt(OE) May 01 23:43:04 fir-md1-s1 kernel: lfsck(OE) May 01 23:43:04 fir-md1-s1 kernel: mgs(OE) May 01 23:43:04 fir-md1-s1 kernel: mgc(OE) May 01 23:43:04 fir-md1-s1 kernel: osd_ldiskfs(OE) May 01 23:43:04 fir-md1-s1 kernel: lquota(OE) May 01 23:43:04 fir-md1-s1 kernel: ldiskfs(OE) May 01 23:43:04 fir-md1-s1 kernel: lustre(OE) May 01 23:43:04 fir-md1-s1 kernel: lmv(OE) May 01 23:43:04 fir-md1-s1 kernel: mdc(OE) May 01 23:43:04 fir-md1-s1 kernel: osc(OE) May 01 23:43:04 fir-md1-s1 kernel: lov(OE) May 01 23:43:04 fir-md1-s1 kernel: fid(OE) May 01 23:43:04 fir-md1-s1 kernel: fld(OE) May 01 23:43:04 fir-md1-s1 kernel: ko2iblnd(OE) May 01 23:43:04 fir-md1-s1 kernel: ptlrpc(OE) May 01 23:43:04 fir-md1-s1 kernel: obdclass(OE) May 01 23:43:04 fir-md1-s1 kernel: lnet(OE) May 01 23:43:04 fir-md1-s1 kernel: libcfs(OE) May 01 23:43:04 fir-md1-s1 kernel: rpcsec_gss_krb5 May 01 23:43:04 fir-md1-s1 kernel: auth_rpcgss May 01 23:43:04 fir-md1-s1 kernel: nfsv4 May 01 23:43:04 fir-md1-s1 kernel: dns_resolver May 01 23:43:04 fir-md1-s1 kernel: nfs May 01 23:43:04 fir-md1-s1 kernel: lockd May 01 23:43:04 fir-md1-s1 kernel: grace May 01 23:43:04 fir-md1-s1 kernel: fscache May 01 23:43:04 fir-md1-s1 kernel: rdma_ucm(OE) May 01 23:43:04 fir-md1-s1 kernel: ib_ucm(OE) May 01 23:43:04 fir-md1-s1 kernel: rdma_cm(OE) May 01 23:43:04 fir-md1-s1 kernel: iw_cm(OE) May 01 23:43:04 fir-md1-s1 kernel: ib_ipoib(OE) May 01 23:43:04 fir-md1-s1 kernel: ib_cm(OE) May 01 23:43:04 fir-md1-s1 kernel: ib_umad(OE) May 01 23:43:04 fir-md1-s1 kernel: mlx5_fpga_tools(OE) May 01 23:43:04 fir-md1-s1 kernel: mlx4_en(OE) May 01 23:43:04 fir-md1-s1 kernel: mlx4_ib(OE) May 01 23:43:04 fir-md1-s1 kernel: mlx4_core(OE) May 01 23:43:04 fir-md1-s1 kernel: dell_rbu May 01 23:43:04 fir-md1-s1 kernel: sunrpc May 01 23:43:04 fir-md1-s1 kernel: vfat May 01 23:43:04 fir-md1-s1 kernel: fat May 01 23:43:04 fir-md1-s1 kernel: dm_round_robin May 01 23:43:04 fir-md1-s1 kernel: amd64_edac_mod May 01 23:43:04 fir-md1-s1 kernel: edac_mce_amd May 01 23:43:04 fir-md1-s1 kernel: kvm_amd May 01 23:43:04 fir-md1-s1 kernel: kvm May 01 23:43:04 fir-md1-s1 kernel: ses May 01 23:43:04 fir-md1-s1 kernel: irqbypass May 01 23:43:04 fir-md1-s1 kernel: crc32_pclmul May 01 23:43:04 fir-md1-s1 kernel: enclosure May 01 23:43:04 fir-md1-s1 kernel: ghash_clmulni_intel May 01 23:43:04 fir-md1-s1 kernel: dcdbas May 01 23:43:04 fir-md1-s1 kernel: aesni_intel May 01 23:43:04 fir-md1-s1 kernel: lrw May 01 23:43:04 fir-md1-s1 kernel: gf128mul May 01 23:43:04 fir-md1-s1 kernel: glue_helper May 01 23:43:04 fir-md1-s1 kernel: ablk_helper May 01 23:43:04 fir-md1-s1 kernel: cryptd May 01 23:43:04 fir-md1-s1 kernel: ipmi_si May 01 23:43:04 fir-md1-s1 kernel: pcspkr May 01 23:43:04 fir-md1-s1 kernel: ipmi_devintf May 01 23:43:04 fir-md1-s1 kernel: ccp May 01 23:43:04 fir-md1-s1 kernel: i2c_piix4 May 01 23:43:04 fir-md1-s1 kernel: dm_multipath May 01 23:43:04 fir-md1-s1 kernel: sg May 01 23:43:04 fir-md1-s1 kernel: k10temp May 01 23:43:04 fir-md1-s1 kernel: ipmi_msghandler May 01 23:43:04 fir-md1-s1 kernel: dm_mod May 01 23:43:04 fir-md1-s1 kernel: acpi_power_meter May 01 23:43:04 fir-md1-s1 kernel: knem(OE) May 01 23:43:04 fir-md1-s1 kernel: ip_tables May 01 23:43:04 fir-md1-s1 kernel: ext4 May 01 23:43:04 fir-md1-s1 kernel: mbcache May 01 23:43:04 fir-md1-s1 kernel: jbd2 May 01 23:43:04 fir-md1-s1 kernel: sd_mod May 01 23:43:04 fir-md1-s1 kernel: crc_t10dif May 01 23:43:04 fir-md1-s1 kernel: crct10dif_generic May 01 23:43:04 fir-md1-s1 kernel: mlx5_ib(OE) May 01 23:43:04 fir-md1-s1 kernel: ib_uverbs(OE) May 01 23:43:04 fir-md1-s1 kernel: ib_core(OE) May 01 23:43:04 fir-md1-s1 kernel: i2c_algo_bit May 01 23:43:04 fir-md1-s1 kernel: drm_kms_helper May 01 23:43:04 fir-md1-s1 kernel: mlx5_core(OE) May 01 23:43:04 fir-md1-s1 kernel: syscopyarea May 01 23:43:04 fir-md1-s1 kernel: sysfillrect May 01 23:43:04 fir-md1-s1 kernel: sysimgblt May 01 23:43:04 fir-md1-s1 kernel: fb_sys_fops May 01 23:43:04 fir-md1-s1 kernel: mlxfw(OE) May 01 23:43:04 fir-md1-s1 kernel: crct10dif_pclmul May 01 23:43:04 fir-md1-s1 kernel: ttm May 01 23:43:04 fir-md1-s1 kernel: devlink May 01 23:43:04 fir-md1-s1 kernel: ahci May 01 23:43:04 fir-md1-s1 kernel: crct10dif_common May 01 23:43:04 fir-md1-s1 kernel: libahci May 01 23:43:04 fir-md1-s1 kernel: drm May 01 23:43:04 fir-md1-s1 kernel: mlx_compat(OE) May 01 23:43:04 fir-md1-s1 kernel: tg3 May 01 23:43:04 fir-md1-s1 kernel: crc32c_intel May 01 23:43:04 fir-md1-s1 kernel: libata May 01 23:43:04 fir-md1-s1 kernel: megaraid_sas May 01 23:43:04 fir-md1-s1 kernel: drm_panel_orientation_quirks May 01 23:43:04 fir-md1-s1 kernel: ptp May 01 23:43:04 fir-md1-s1 kernel: pps_core May 01 23:43:04 fir-md1-s1 kernel: mpt3sas(OE) May 01 23:43:04 fir-md1-s1 kernel: raid_class May 01 23:43:04 fir-md1-s1 kernel: scsi_transport_sas May 01 23:43:04 fir-md1-s1 kernel: [last unloaded: libcfs] May 01 23:43:04 fir-md1-s1 kernel: May 01 23:43:04 fir-md1-s1 kernel: CPU: 12 PID: 103263 Comm: mdt_io00_073 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 May 01 23:43:04 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 May 01 23:43:04 fir-md1-s1 kernel: task: ffff984cba23e180 ti: ffff98286afac000 task.ti: ffff98286afac000 May 01 23:43:04 fir-md1-s1 kernel: RIP: 0010:[] May 01 23:43:04 fir-md1-s1 kernel: [] native_queued_spin_lock_slowpath+0x126/0x200 May 01 23:43:04 fir-md1-s1 kernel: RSP: 0018:ffff98286afaf750 EFLAGS: 00000246 May 01 23:43:04 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffff9831739477d8 RCX: 0000000000610000 May 01 23:43:04 fir-md1-s1 kernel: RDX: ffff984cff81b780 RSI: 0000000001110101 RDI: ffff982c9fc8c480 May 01 23:43:04 fir-md1-s1 kernel: RBP: ffff98286afaf750 R08: ffff982cfeedb780 R09: 0000000000000000 May 01 23:43:04 fir-md1-s1 kernel: R10: ffff982cfeedf140 R11: ffffde3edb488200 R12: 0000000000000000 May 01 23:43:04 fir-md1-s1 kernel: R13: ffff98286afaf6f0 R14: ffff983173947548 R15: 0000000000000000 May 01 23:43:04 fir-md1-s1 kernel: FS: 00007fde62083880(0000) GS:ffff982cfeec0000(0000) knlGS:0000000000000000 May 01 23:43:04 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 01 23:43:04 fir-md1-s1 kernel: CR2: 00007f427f58b000 CR3: 000000203caa6000 CR4: 00000000003407e0 May 01 23:43:04 fir-md1-s1 kernel: Call Trace: May 01 23:43:04 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf May 01 23:43:04 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 May 01 23:43:04 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] May 01 23:43:04 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] May 01 23:43:04 fir-md1-s1 kernel: [] ? zone_statistics+0x88/0xa0 May 01 23:43:04 fir-md1-s1 kernel: [] ? qsd_op_begin+0xb1/0x4b0 [lquota] May 01 23:43:04 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] May 01 23:43:04 fir-md1-s1 kernel: [] ? ldiskfs_inode_attach_jinode+0x55/0xd0 [ldiskfs] May 01 23:43:04 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] May 01 23:43:04 fir-md1-s1 kernel: [] osd_write_commit+0x3a2/0x8c0 [osd_ldiskfs] May 01 23:43:04 fir-md1-s1 kernel: [] ? __ldiskfs_journal_start_sb+0x69/0xe0 [ldiskfs] May 01 23:43:04 fir-md1-s1 kernel: [] mdt_commitrw_write.isra.46+0x608/0xd20 [mdt] May 01 23:43:04 fir-md1-s1 kernel: [] mdt_obd_commitrw+0x29b/0x520 [mdt] May 01 23:43:04 fir-md1-s1 kernel: [] obd_commitrw+0x9c/0x370 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] tgt_brw_write+0x100d/0x1a90 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? lustre_msg_buf_v2+0x1b0/0x1b0 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? update_curr+0x14c/0x1e0 May 01 23:43:04 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] May 01 23:43:04 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? default_wake_function+0x12/0x20 May 01 23:43:04 fir-md1-s1 kernel: [] ? __wake_up_common+0x5b/0x90 May 01 23:43:04 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 May 01 23:43:04 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:43:04 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 May 01 23:43:04 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:43:04 fir-md1-s1 kernel: Code: May 01 23:43:04 fir-md1-s1 kernel: 0d May 01 23:43:04 fir-md1-s1 kernel: 48 May 01 23:43:04 fir-md1-s1 kernel: 98 May 01 23:43:04 fir-md1-s1 kernel: 83 May 01 23:43:04 fir-md1-s1 kernel: e2 May 01 23:43:04 fir-md1-s1 kernel: 30 May 01 23:43:04 fir-md1-s1 kernel: 48 May 01 23:43:04 fir-md1-s1 kernel: 81 May 01 23:43:04 fir-md1-s1 kernel: c2 May 01 23:43:04 fir-md1-s1 kernel: 80 May 01 23:43:04 fir-md1-s1 kernel: b7 May 01 23:43:04 fir-md1-s1 kernel: 01 May 01 23:43:04 fir-md1-s1 kernel: 00 May 01 23:43:04 fir-md1-s1 kernel: 48 May 01 23:43:04 fir-md1-s1 kernel: 03 May 01 23:43:04 fir-md1-s1 kernel: 14 May 01 23:43:04 fir-md1-s1 kernel: c5 May 01 23:43:04 fir-md1-s1 kernel: 60 May 01 23:43:04 fir-md1-s1 kernel: b9 May 01 23:43:04 fir-md1-s1 kernel: b4 May 01 23:43:04 fir-md1-s1 kernel: b7 May 01 23:43:04 fir-md1-s1 kernel: 4c May 01 23:43:04 fir-md1-s1 kernel: 89 May 01 23:43:04 fir-md1-s1 kernel: 02 May 01 23:43:04 fir-md1-s1 kernel: 41 May 01 23:43:04 fir-md1-s1 kernel: 8b May 01 23:43:04 fir-md1-s1 kernel: 40 May 01 23:43:04 fir-md1-s1 kernel: 08 May 01 23:43:04 fir-md1-s1 kernel: 85 May 01 23:43:04 fir-md1-s1 kernel: c0 May 01 23:43:04 fir-md1-s1 kernel: 75 May 01 23:43:04 fir-md1-s1 kernel: 0f May 01 23:43:04 fir-md1-s1 kernel: 0f May 01 23:43:04 fir-md1-s1 kernel: 1f May 01 23:43:04 fir-md1-s1 kernel: 44 May 01 23:43:04 fir-md1-s1 kernel: 00 May 01 23:43:04 fir-md1-s1 kernel: 00 May 01 23:43:04 fir-md1-s1 kernel: f3 May 01 23:43:04 fir-md1-s1 kernel: 90 May 01 23:43:04 fir-md1-s1 kernel: 41 May 01 23:43:04 fir-md1-s1 kernel: 8b May 01 23:43:04 fir-md1-s1 kernel: 40 May 01 23:43:04 fir-md1-s1 kernel: 08 May 01 23:43:04 fir-md1-s1 kernel: <85> May 01 23:43:04 fir-md1-s1 kernel: c0 May 01 23:43:04 fir-md1-s1 kernel: 74 May 01 23:43:04 fir-md1-s1 kernel: f6 May 01 23:43:04 fir-md1-s1 kernel: 4d May 01 23:43:04 fir-md1-s1 kernel: 8b May 01 23:43:04 fir-md1-s1 kernel: 08 May 01 23:43:04 fir-md1-s1 kernel: 4d May 01 23:43:04 fir-md1-s1 kernel: 85 May 01 23:43:04 fir-md1-s1 kernel: c9 May 01 23:43:04 fir-md1-s1 kernel: 74 May 01 23:43:04 fir-md1-s1 kernel: 04 May 01 23:43:04 fir-md1-s1 kernel: 41 May 01 23:43:04 fir-md1-s1 kernel: 0f May 01 23:43:04 fir-md1-s1 kernel: 18 May 01 23:43:04 fir-md1-s1 kernel: 09 May 01 23:43:04 fir-md1-s1 kernel: 8b May 01 23:43:04 fir-md1-s1 kernel: 17 May 01 23:43:04 fir-md1-s1 kernel: 0f May 01 23:43:04 fir-md1-s1 kernel: b7 May 01 23:43:04 fir-md1-s1 kernel: c2 May 01 23:43:04 fir-md1-s1 kernel: May 01 23:43:04 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#16 stuck for 22s! [mdt00_018:102388] May 01 23:43:04 fir-md1-s1 kernel: Modules linked in: May 01 23:43:04 fir-md1-s1 kernel: osp(OE) May 01 23:43:04 fir-md1-s1 kernel: mdd(OE) May 01 23:43:04 fir-md1-s1 kernel: lod(OE) May 01 23:43:04 fir-md1-s1 kernel: mdt(OE) May 01 23:43:04 fir-md1-s1 kernel: lfsck(OE) May 01 23:43:04 fir-md1-s1 kernel: mgs(OE) May 01 23:43:04 fir-md1-s1 kernel: mgc(OE) May 01 23:43:04 fir-md1-s1 kernel: osd_ldiskfs(OE) May 01 23:43:04 fir-md1-s1 kernel: lquota(OE) May 01 23:43:04 fir-md1-s1 kernel: ldiskfs(OE) May 01 23:43:04 fir-md1-s1 kernel: lustre(OE) May 01 23:43:04 fir-md1-s1 kernel: lmv(OE) May 01 23:43:04 fir-md1-s1 kernel: mdc(OE) May 01 23:43:04 fir-md1-s1 kernel: osc(OE) May 01 23:43:04 fir-md1-s1 kernel: lov(OE) May 01 23:43:04 fir-md1-s1 kernel: fid(OE) May 01 23:43:04 fir-md1-s1 kernel: fld(OE) May 01 23:43:04 fir-md1-s1 kernel: ko2iblnd(OE) May 01 23:43:04 fir-md1-s1 kernel: ptlrpc(OE) May 01 23:43:04 fir-md1-s1 kernel: obdclass(OE) May 01 23:43:04 fir-md1-s1 kernel: lnet(OE) May 01 23:43:04 fir-md1-s1 kernel: libcfs(OE) May 01 23:43:04 fir-md1-s1 kernel: rpcsec_gss_krb5 May 01 23:43:04 fir-md1-s1 kernel: auth_rpcgss May 01 23:43:04 fir-md1-s1 kernel: nfsv4 May 01 23:43:04 fir-md1-s1 kernel: dns_resolver May 01 23:43:04 fir-md1-s1 kernel: nfs May 01 23:43:04 fir-md1-s1 kernel: lockd May 01 23:43:04 fir-md1-s1 kernel: grace May 01 23:43:04 fir-md1-s1 kernel: fscache May 01 23:43:04 fir-md1-s1 kernel: rdma_ucm(OE) May 01 23:43:04 fir-md1-s1 kernel: ib_ucm(OE) May 01 23:43:04 fir-md1-s1 kernel: rdma_cm(OE) May 01 23:43:04 fir-md1-s1 kernel: iw_cm(OE) May 01 23:43:04 fir-md1-s1 kernel: ib_ipoib(OE) May 01 23:43:04 fir-md1-s1 kernel: ib_cm(OE) May 01 23:43:04 fir-md1-s1 kernel: ib_umad(OE) May 01 23:43:04 fir-md1-s1 kernel: mlx5_fpga_tools(OE) May 01 23:43:04 fir-md1-s1 kernel: mlx4_en(OE) May 01 23:43:04 fir-md1-s1 kernel: mlx4_ib(OE) May 01 23:43:04 fir-md1-s1 kernel: mlx4_core(OE) May 01 23:43:04 fir-md1-s1 kernel: dell_rbu May 01 23:43:04 fir-md1-s1 kernel: sunrpc May 01 23:43:04 fir-md1-s1 kernel: vfat May 01 23:43:04 fir-md1-s1 kernel: fat May 01 23:43:04 fir-md1-s1 kernel: dm_round_robin May 01 23:43:04 fir-md1-s1 kernel: amd64_edac_mod May 01 23:43:04 fir-md1-s1 kernel: edac_mce_amd May 01 23:43:04 fir-md1-s1 kernel: kvm_amd May 01 23:43:04 fir-md1-s1 kernel: kvm May 01 23:43:04 fir-md1-s1 kernel: ses May 01 23:43:04 fir-md1-s1 kernel: irqbypass May 01 23:43:04 fir-md1-s1 kernel: crc32_pclmul May 01 23:43:04 fir-md1-s1 kernel: enclosure May 01 23:43:04 fir-md1-s1 kernel: ghash_clmulni_intel May 01 23:43:04 fir-md1-s1 kernel: dcdbas May 01 23:43:04 fir-md1-s1 kernel: aesni_intel May 01 23:43:04 fir-md1-s1 kernel: lrw May 01 23:43:04 fir-md1-s1 kernel: gf128mul May 01 23:43:04 fir-md1-s1 kernel: glue_helper May 01 23:43:04 fir-md1-s1 kernel: ablk_helper May 01 23:43:04 fir-md1-s1 kernel: cryptd May 01 23:43:04 fir-md1-s1 kernel: ipmi_si May 01 23:43:04 fir-md1-s1 kernel: pcspkr May 01 23:43:04 fir-md1-s1 kernel: ipmi_devintf May 01 23:43:04 fir-md1-s1 kernel: ccp May 01 23:43:04 fir-md1-s1 kernel: i2c_piix4 May 01 23:43:04 fir-md1-s1 kernel: dm_multipath May 01 23:43:04 fir-md1-s1 kernel: sg May 01 23:43:04 fir-md1-s1 kernel: k10temp May 01 23:43:04 fir-md1-s1 kernel: ipmi_msghandler May 01 23:43:04 fir-md1-s1 kernel: dm_mod May 01 23:43:04 fir-md1-s1 kernel: acpi_power_meter May 01 23:43:04 fir-md1-s1 kernel: knem(OE) May 01 23:43:04 fir-md1-s1 kernel: ip_tables May 01 23:43:04 fir-md1-s1 kernel: ext4 May 01 23:43:04 fir-md1-s1 kernel: mbcache May 01 23:43:04 fir-md1-s1 kernel: jbd2 May 01 23:43:04 fir-md1-s1 kernel: sd_mod May 01 23:43:04 fir-md1-s1 kernel: crc_t10dif May 01 23:43:04 fir-md1-s1 kernel: crct10dif_generic May 01 23:43:04 fir-md1-s1 kernel: mlx5_ib(OE) May 01 23:43:04 fir-md1-s1 kernel: ib_uverbs(OE) May 01 23:43:04 fir-md1-s1 kernel: ib_core(OE) May 01 23:43:04 fir-md1-s1 kernel: i2c_algo_bit May 01 23:43:04 fir-md1-s1 kernel: drm_kms_helper May 01 23:43:04 fir-md1-s1 kernel: mlx5_core(OE) May 01 23:43:04 fir-md1-s1 kernel: syscopyarea May 01 23:43:04 fir-md1-s1 kernel: sysfillrect May 01 23:43:04 fir-md1-s1 kernel: sysimgblt May 01 23:43:04 fir-md1-s1 kernel: fb_sys_fops May 01 23:43:04 fir-md1-s1 kernel: mlxfw(OE) May 01 23:43:04 fir-md1-s1 kernel: crct10dif_pclmul May 01 23:43:04 fir-md1-s1 kernel: ttm May 01 23:43:04 fir-md1-s1 kernel: devlink May 01 23:43:04 fir-md1-s1 kernel: ahci May 01 23:43:04 fir-md1-s1 kernel: crct10dif_common May 01 23:43:04 fir-md1-s1 kernel: libahci May 01 23:43:04 fir-md1-s1 kernel: drm May 01 23:43:04 fir-md1-s1 kernel: mlx_compat(OE) May 01 23:43:04 fir-md1-s1 kernel: tg3 May 01 23:43:04 fir-md1-s1 kernel: crc32c_intel May 01 23:43:04 fir-md1-s1 kernel: libata May 01 23:43:04 fir-md1-s1 kernel: megaraid_sas May 01 23:43:04 fir-md1-s1 kernel: drm_panel_orientation_quirks May 01 23:43:04 fir-md1-s1 kernel: ptp May 01 23:43:04 fir-md1-s1 kernel: pps_core May 01 23:43:04 fir-md1-s1 kernel: mpt3sas(OE) May 01 23:43:04 fir-md1-s1 kernel: raid_class May 01 23:43:04 fir-md1-s1 kernel: scsi_transport_sas May 01 23:43:04 fir-md1-s1 kernel: [last unloaded: libcfs] May 01 23:43:04 fir-md1-s1 kernel: May 01 23:43:04 fir-md1-s1 kernel: CPU: 16 PID: 102388 Comm: mdt00_018 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 May 01 23:43:04 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 May 01 23:43:04 fir-md1-s1 kernel: task: ffff985884642080 ti: ffff984c4b64c000 task.ti: ffff984c4b64c000 May 01 23:43:04 fir-md1-s1 kernel: RIP: 0010:[] May 01 23:43:04 fir-md1-s1 kernel: [] ldiskfs_inode_touch_time_cmp+0x40/0x90 [ldiskfs] May 01 23:43:04 fir-md1-s1 kernel: RSP: 0018:ffff984c4b64f180 EFLAGS: 00000246 May 01 23:43:04 fir-md1-s1 kernel: RAX: 0000000000100000 RBX: ffffffffb7019f22 RCX: 000000010ca590b2 May 01 23:43:04 fir-md1-s1 kernel: RDX: ffff9824fb188380 RSI: ffff985cd731ce50 RDI: 0000000000000000 May 01 23:43:04 fir-md1-s1 kernel: RBP: ffff984c4b64f180 R08: ffff984c4b64f300 R09: 00000000003ecd00 May 01 23:43:04 fir-md1-s1 kernel: R10: 0000000047bdbb01 R11: ffffde3f261ef6c0 R12: 0000000000000000 May 01 23:43:04 fir-md1-s1 kernel: R13: ffff982d3f224c80 R14: ffff983cf8b38400 R15: ffff982cfef254b8 May 01 23:43:04 fir-md1-s1 kernel: FS: 00007f32ccf2c740(0000) GS:ffff982cfef00000(0000) knlGS:0000000000000000 May 01 23:43:04 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 01 23:43:04 fir-md1-s1 kernel: CR2: 00007f32c5fef140 CR3: 00000012f7610000 CR4: 00000000003407e0 May 01 23:43:04 fir-md1-s1 kernel: Call Trace: May 01 23:43:04 fir-md1-s1 kernel: [] merge+0x62/0xc0 May 01 23:43:04 fir-md1-s1 kernel: [] ? ldiskfs_init_inode_table+0x410/0x410 [ldiskfs] May 01 23:43:04 fir-md1-s1 kernel: [] list_sort+0x9b/0x250 May 01 23:43:04 fir-md1-s1 kernel: [] __ldiskfs_es_shrink+0x1ce/0x2a0 [ldiskfs] May 01 23:43:04 fir-md1-s1 kernel: [] ldiskfs_es_shrink+0xb4/0x130 [ldiskfs] May 01 23:43:04 fir-md1-s1 kernel: [] shrink_slab+0x175/0x340 May 01 23:43:04 fir-md1-s1 kernel: [] ? zone_watermark_ok+0x1f/0x30 May 01 23:43:04 fir-md1-s1 kernel: [] ? compaction_suitable+0xa3/0xb0 May 01 23:43:04 fir-md1-s1 kernel: [] zone_reclaim+0x1d1/0x2f0 May 01 23:43:04 fir-md1-s1 kernel: [] get_page_from_freelist+0x87b/0xa70 May 01 23:43:04 fir-md1-s1 kernel: [] ? __getblk+0x2d/0x300 May 01 23:43:04 fir-md1-s1 kernel: [] __alloc_pages_nodemask+0x176/0x420 May 01 23:43:04 fir-md1-s1 kernel: [] alloc_pages_current+0x98/0x110 May 01 23:43:04 fir-md1-s1 kernel: [] new_slab+0x2c5/0x390 May 01 23:43:04 fir-md1-s1 kernel: [] ___slab_alloc+0x3ac/0x4f0 May 01 23:43:04 fir-md1-s1 kernel: [] ? osp_object_alloc+0x40/0x170 [osp] May 01 23:43:04 fir-md1-s1 kernel: [] ? fld_cache_lookup+0x36/0x1a0 [fld] May 01 23:43:04 fir-md1-s1 kernel: [] ? fld_local_lookup+0x62/0x270 [fld] May 01 23:43:04 fir-md1-s1 kernel: [] ? osp_object_alloc+0x40/0x170 [osp] May 01 23:43:04 fir-md1-s1 kernel: [] __slab_alloc+0x40/0x5c May 01 23:43:04 fir-md1-s1 kernel: [] kmem_cache_alloc+0x19b/0x1f0 May 01 23:43:04 fir-md1-s1 kernel: [] ? osp_object_alloc+0x40/0x170 [osp] May 01 23:43:04 fir-md1-s1 kernel: [] osp_object_alloc+0x40/0x170 [osp] May 01 23:43:04 fir-md1-s1 kernel: [] lod_object_init+0x1e7/0x3c0 [lod] May 01 23:43:04 fir-md1-s1 kernel: [] lu_object_alloc+0xe5/0x320 [obdclass] May 01 23:43:04 fir-md1-s1 kernel: [] lu_object_find_at+0x76/0x280 [obdclass] May 01 23:43:04 fir-md1-s1 kernel: [] lu_object_find_slice+0x1f/0x90 [obdclass] May 01 23:43:04 fir-md1-s1 kernel: [] mdd_object_find+0x10/0x70 [mdd] May 01 23:43:04 fir-md1-s1 kernel: [] obf_lookup+0x2c9/0x350 [mdd] May 01 23:43:04 fir-md1-s1 kernel: [] ? req_capsule_get_size+0x31/0x70 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0xf7c/0x1c30 [mdt] May 01 23:43:04 fir-md1-s1 kernel: [] ? lustre_msg_buf+0x17/0x60 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? __req_capsule_get+0x15f/0x740 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? lustre_msg_get_flags+0x2c/0xa0 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] May 01 23:43:04 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] May 01 23:43:04 fir-md1-s1 kernel: [] ? mdt_intent_layout+0xcc0/0xcc0 [mdt] May 01 23:43:04 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? cfs_hash_bd_add_locked+0x63/0x80 [libcfs] May 01 23:43:04 fir-md1-s1 kernel: [] ? cfs_hash_add+0xbe/0x1a0 [libcfs] May 01 23:43:04 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? lustre_swab_ldlm_lock_desc+0x30/0x30 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] May 01 23:43:04 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? default_wake_function+0x12/0x20 May 01 23:43:04 fir-md1-s1 kernel: [] ? __wake_up_common+0x5b/0x90 May 01 23:43:04 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 May 01 23:43:04 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:43:04 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 May 01 23:43:04 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:43:04 fir-md1-s1 kernel: Code: May 01 23:43:04 fir-md1-s1 kernel: 01 May 01 23:43:04 fir-md1-s1 kernel: 74 May 01 23:43:04 fir-md1-s1 kernel: 15 May 01 23:43:04 fir-md1-s1 kernel: 48 May 01 23:43:04 fir-md1-s1 kernel: 8b May 01 23:43:04 fir-md1-s1 kernel: 8a May 01 23:43:04 fir-md1-s1 kernel: e8 May 01 23:43:04 fir-md1-s1 kernel: fc May 01 23:43:04 fir-md1-s1 kernel: ff May 01 23:43:04 fir-md1-s1 kernel: ff May 01 23:43:04 fir-md1-s1 kernel: b8 May 01 23:43:04 fir-md1-s1 kernel: 01 May 01 23:43:04 fir-md1-s1 kernel: 00 May 01 23:43:04 fir-md1-s1 kernel: 00 May 01 23:43:04 fir-md1-s1 kernel: 00 May 01 23:43:04 fir-md1-s1 kernel: 48 May 01 23:43:04 fir-md1-s1 kernel: c1 May 01 23:43:04 fir-md1-s1 kernel: e9 May 01 23:43:04 fir-md1-s1 kernel: 2b May 01 23:43:04 fir-md1-s1 kernel: 83 May 01 23:43:04 fir-md1-s1 kernel: e1 May 01 23:43:04 fir-md1-s1 kernel: 01 May 01 23:43:04 fir-md1-s1 kernel: 74 May 01 23:43:04 fir-md1-s1 kernel: 29 May 01 23:43:04 fir-md1-s1 kernel: 48 May 01 23:43:04 fir-md1-s1 kernel: 8b May 01 23:43:04 fir-md1-s1 kernel: 86 May 01 23:43:04 fir-md1-s1 kernel: e8 May 01 23:43:04 fir-md1-s1 kernel: fc May 01 23:43:04 fir-md1-s1 kernel: ff May 01 23:43:04 fir-md1-s1 kernel: ff May 01 23:43:04 fir-md1-s1 kernel: 48 May 01 23:43:04 fir-md1-s1 kernel: c1 May 01 23:43:04 fir-md1-s1 kernel: e8 May 01 23:43:04 fir-md1-s1 kernel: 2b May 01 23:43:04 fir-md1-s1 kernel: a8 May 01 23:43:04 fir-md1-s1 kernel: 01 May 01 23:43:04 fir-md1-s1 kernel: 74 May 01 23:43:04 fir-md1-s1 kernel: 24 May 01 23:43:04 fir-md1-s1 kernel: 48 May 01 23:43:04 fir-md1-s1 kernel: 8b May 01 23:43:04 fir-md1-s1 kernel: 4e May 01 23:43:04 fir-md1-s1 kernel: 18 May 01 23:43:04 fir-md1-s1 kernel: <48> May 01 23:43:04 fir-md1-s1 kernel: 8b May 01 23:43:04 fir-md1-s1 kernel: 42 May 01 23:43:04 fir-md1-s1 kernel: 18 May 01 23:43:04 fir-md1-s1 kernel: 48 May 01 23:43:04 fir-md1-s1 kernel: 39 May 01 23:43:04 fir-md1-s1 kernel: c1 May 01 23:43:04 fir-md1-s1 kernel: 74 May 01 23:43:04 fir-md1-s1 kernel: 37 May 01 23:43:04 fir-md1-s1 kernel: 48 May 01 23:43:04 fir-md1-s1 kernel: 29 May 01 23:43:04 fir-md1-s1 kernel: c8 May 01 23:43:04 fir-md1-s1 kernel: 48 May 01 23:43:04 fir-md1-s1 kernel: c1 May 01 23:43:04 fir-md1-s1 kernel: f8 May 01 23:43:04 fir-md1-s1 kernel: 3f May 01 23:43:04 fir-md1-s1 kernel: 83 May 01 23:43:04 fir-md1-s1 kernel: e0 May 01 23:43:04 fir-md1-s1 kernel: 02 May 01 23:43:04 fir-md1-s1 kernel: 83 May 01 23:43:04 fir-md1-s1 kernel: e8 May 01 23:43:04 fir-md1-s1 kernel: May 01 23:43:04 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#18 stuck for 22s! [mdt_io02_065:103134] May 01 23:43:04 fir-md1-s1 kernel: Modules linked in: May 01 23:43:04 fir-md1-s1 kernel: osp(OE) May 01 23:43:04 fir-md1-s1 kernel: mdd(OE) May 01 23:43:04 fir-md1-s1 kernel: lod(OE) May 01 23:43:04 fir-md1-s1 kernel: mdt(OE) May 01 23:43:04 fir-md1-s1 kernel: lfsck(OE) May 01 23:43:04 fir-md1-s1 kernel: mgs(OE) May 01 23:43:04 fir-md1-s1 kernel: mgc(OE) May 01 23:43:04 fir-md1-s1 kernel: osd_ldiskfs(OE) May 01 23:43:04 fir-md1-s1 kernel: lquota(OE) May 01 23:43:04 fir-md1-s1 kernel: ldiskfs(OE) May 01 23:43:04 fir-md1-s1 kernel: lustre(OE) May 01 23:43:04 fir-md1-s1 kernel: lmv(OE) May 01 23:43:04 fir-md1-s1 kernel: mdc(OE) May 01 23:43:04 fir-md1-s1 kernel: osc(OE) May 01 23:43:04 fir-md1-s1 kernel: lov(OE) May 01 23:43:04 fir-md1-s1 kernel: fid(OE) May 01 23:43:04 fir-md1-s1 kernel: fld(OE) May 01 23:43:04 fir-md1-s1 kernel: ko2iblnd(OE) May 01 23:43:04 fir-md1-s1 kernel: ptlrpc(OE) May 01 23:43:04 fir-md1-s1 kernel: obdclass(OE) May 01 23:43:04 fir-md1-s1 kernel: lnet(OE) May 01 23:43:04 fir-md1-s1 kernel: libcfs(OE) May 01 23:43:04 fir-md1-s1 kernel: rpcsec_gss_krb5 May 01 23:43:04 fir-md1-s1 kernel: auth_rpcgss May 01 23:43:04 fir-md1-s1 kernel: nfsv4 May 01 23:43:04 fir-md1-s1 kernel: dns_resolver May 01 23:43:04 fir-md1-s1 kernel: nfs May 01 23:43:04 fir-md1-s1 kernel: lockd May 01 23:43:04 fir-md1-s1 kernel: grace May 01 23:43:04 fir-md1-s1 kernel: fscache May 01 23:43:04 fir-md1-s1 kernel: rdma_ucm(OE) May 01 23:43:04 fir-md1-s1 kernel: ib_ucm(OE) May 01 23:43:04 fir-md1-s1 kernel: rdma_cm(OE) May 01 23:43:04 fir-md1-s1 kernel: iw_cm(OE) May 01 23:43:04 fir-md1-s1 kernel: ib_ipoib(OE) May 01 23:43:04 fir-md1-s1 kernel: ib_cm(OE) May 01 23:43:04 fir-md1-s1 kernel: ib_umad(OE) May 01 23:43:04 fir-md1-s1 kernel: mlx5_fpga_tools(OE) May 01 23:43:04 fir-md1-s1 kernel: mlx4_en(OE) May 01 23:43:04 fir-md1-s1 kernel: mlx4_ib(OE) May 01 23:43:04 fir-md1-s1 kernel: mlx4_core(OE) May 01 23:43:04 fir-md1-s1 kernel: dell_rbu May 01 23:43:04 fir-md1-s1 kernel: sunrpc May 01 23:43:04 fir-md1-s1 kernel: vfat May 01 23:43:04 fir-md1-s1 kernel: fat May 01 23:43:04 fir-md1-s1 kernel: dm_round_robin May 01 23:43:04 fir-md1-s1 kernel: amd64_edac_mod May 01 23:43:04 fir-md1-s1 kernel: edac_mce_amd May 01 23:43:04 fir-md1-s1 kernel: kvm_amd May 01 23:43:04 fir-md1-s1 kernel: kvm May 01 23:43:04 fir-md1-s1 kernel: ses May 01 23:43:04 fir-md1-s1 kernel: irqbypass May 01 23:43:04 fir-md1-s1 kernel: crc32_pclmul May 01 23:43:04 fir-md1-s1 kernel: enclosure May 01 23:43:04 fir-md1-s1 kernel: ghash_clmulni_intel May 01 23:43:04 fir-md1-s1 kernel: dcdbas May 01 23:43:04 fir-md1-s1 kernel: aesni_intel May 01 23:43:04 fir-md1-s1 kernel: lrw May 01 23:43:04 fir-md1-s1 kernel: gf128mul May 01 23:43:04 fir-md1-s1 kernel: glue_helper May 01 23:43:04 fir-md1-s1 kernel: ablk_helper May 01 23:43:04 fir-md1-s1 kernel: cryptd May 01 23:43:04 fir-md1-s1 kernel: ipmi_si May 01 23:43:04 fir-md1-s1 kernel: pcspkr May 01 23:43:04 fir-md1-s1 kernel: ipmi_devintf May 01 23:43:04 fir-md1-s1 kernel: ccp May 01 23:43:04 fir-md1-s1 kernel: i2c_piix4 May 01 23:43:04 fir-md1-s1 kernel: dm_multipath May 01 23:43:04 fir-md1-s1 kernel: sg May 01 23:43:04 fir-md1-s1 kernel: k10temp May 01 23:43:04 fir-md1-s1 kernel: ipmi_msghandler May 01 23:43:04 fir-md1-s1 kernel: dm_mod May 01 23:43:04 fir-md1-s1 kernel: acpi_power_meter May 01 23:43:04 fir-md1-s1 kernel: knem(OE) May 01 23:43:04 fir-md1-s1 kernel: ip_tables May 01 23:43:04 fir-md1-s1 kernel: ext4 May 01 23:43:04 fir-md1-s1 kernel: mbcache May 01 23:43:04 fir-md1-s1 kernel: jbd2 May 01 23:43:04 fir-md1-s1 kernel: sd_mod May 01 23:43:04 fir-md1-s1 kernel: crc_t10dif May 01 23:43:04 fir-md1-s1 kernel: crct10dif_generic May 01 23:43:04 fir-md1-s1 kernel: mlx5_ib(OE) May 01 23:43:04 fir-md1-s1 kernel: ib_uverbs(OE) May 01 23:43:04 fir-md1-s1 kernel: ib_core(OE) May 01 23:43:04 fir-md1-s1 kernel: i2c_algo_bit May 01 23:43:04 fir-md1-s1 kernel: drm_kms_helper May 01 23:43:04 fir-md1-s1 kernel: mlx5_core(OE) May 01 23:43:04 fir-md1-s1 kernel: syscopyarea May 01 23:43:04 fir-md1-s1 kernel: sysfillrect May 01 23:43:04 fir-md1-s1 kernel: sysimgblt May 01 23:43:04 fir-md1-s1 kernel: fb_sys_fops May 01 23:43:04 fir-md1-s1 kernel: mlxfw(OE) May 01 23:43:04 fir-md1-s1 kernel: crct10dif_pclmul May 01 23:43:04 fir-md1-s1 kernel: ttm May 01 23:43:04 fir-md1-s1 kernel: devlink May 01 23:43:04 fir-md1-s1 kernel: ahci May 01 23:43:04 fir-md1-s1 kernel: crct10dif_common May 01 23:43:04 fir-md1-s1 kernel: libahci May 01 23:43:04 fir-md1-s1 kernel: drm May 01 23:43:04 fir-md1-s1 kernel: mlx_compat(OE) May 01 23:43:04 fir-md1-s1 kernel: tg3 May 01 23:43:04 fir-md1-s1 kernel: crc32c_intel May 01 23:43:04 fir-md1-s1 kernel: libata May 01 23:43:04 fir-md1-s1 kernel: megaraid_sas May 01 23:43:04 fir-md1-s1 kernel: drm_panel_orientation_quirks May 01 23:43:04 fir-md1-s1 kernel: ptp May 01 23:43:04 fir-md1-s1 kernel: pps_core May 01 23:43:04 fir-md1-s1 kernel: mpt3sas(OE) May 01 23:43:04 fir-md1-s1 kernel: raid_class May 01 23:43:04 fir-md1-s1 kernel: scsi_transport_sas May 01 23:43:04 fir-md1-s1 kernel: [last unloaded: libcfs] May 01 23:43:04 fir-md1-s1 kernel: May 01 23:43:04 fir-md1-s1 kernel: CPU: 18 PID: 103134 Comm: mdt_io02_065 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 May 01 23:43:04 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 May 01 23:43:04 fir-md1-s1 kernel: task: ffff985ccda90000 ti: ffff98583efd4000 task.ti: ffff98583efd4000 May 01 23:43:04 fir-md1-s1 kernel: RIP: 0010:[] May 01 23:43:04 fir-md1-s1 kernel: [] native_queued_spin_lock_slowpath+0x122/0x200 May 01 23:43:04 fir-md1-s1 kernel: RSP: 0018:ffff98583efd7750 EFLAGS: 00000246 May 01 23:43:04 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffff983164d60378 RCX: 0000000000910000 May 01 23:43:04 fir-md1-s1 kernel: RDX: ffff983cff69b780 RSI: 0000000000490101 RDI: ffff982c9fc8c480 May 01 23:43:04 fir-md1-s1 kernel: RBP: ffff98583efd7750 R08: ffff984cff71b780 R09: 0000000000000000 May 01 23:43:04 fir-md1-s1 kernel: R10: ffff984cff71f140 R11: ffffde3ef98bd000 R12: 0000000000000000 May 01 23:43:04 fir-md1-s1 kernel: R13: ffff98583efd76f0 R14: ffff983164d600e8 R15: 0000000000000000 May 01 23:43:04 fir-md1-s1 kernel: FS: 00007f010bbcf880(0000) GS:ffff984cff700000(0000) knlGS:0000000000000000 May 01 23:43:04 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 01 23:43:04 fir-md1-s1 kernel: CR2: 0000000001c9e8e0 CR3: 000000402db9c000 CR4: 00000000003407e0 May 01 23:43:04 fir-md1-s1 kernel: Call Trace: May 01 23:43:04 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf May 01 23:43:04 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 May 01 23:43:04 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] May 01 23:43:04 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] May 01 23:43:04 fir-md1-s1 kernel: [] ? zone_statistics+0x88/0xa0 May 01 23:43:04 fir-md1-s1 kernel: [] ? qsd_op_begin+0xb1/0x4b0 [lquota] May 01 23:43:04 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] May 01 23:43:04 fir-md1-s1 kernel: [] ? ldiskfs_inode_attach_jinode+0x55/0xd0 [ldiskfs] May 01 23:43:04 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] May 01 23:43:04 fir-md1-s1 kernel: [] osd_write_commit+0x3a2/0x8c0 [osd_ldiskfs] May 01 23:43:04 fir-md1-s1 kernel: [] ? __ldiskfs_journal_start_sb+0x69/0xe0 [ldiskfs] May 01 23:43:04 fir-md1-s1 kernel: [] mdt_commitrw_write.isra.46+0x608/0xd20 [mdt] May 01 23:43:04 fir-md1-s1 kernel: [] mdt_obd_commitrw+0x29b/0x520 [mdt] May 01 23:43:04 fir-md1-s1 kernel: [] obd_commitrw+0x9c/0x370 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] tgt_brw_write+0x100d/0x1a90 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? lustre_msg_buf_v2+0x1b0/0x1b0 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? lustre_msg_buf+0x17/0x60 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? update_curr+0x14c/0x1e0 May 01 23:43:04 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] May 01 23:43:04 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? default_wake_function+0x12/0x20 May 01 23:43:04 fir-md1-s1 kernel: [] ? __wake_up_common+0x5b/0x90 May 01 23:43:04 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 May 01 23:43:04 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:43:04 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 May 01 23:43:04 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:43:04 fir-md1-s1 kernel: Code: May 01 23:43:04 fir-md1-s1 kernel: 13 May 01 23:43:04 fir-md1-s1 kernel: 48 May 01 23:43:04 fir-md1-s1 kernel: c1 May 01 23:43:04 fir-md1-s1 kernel: ea May 01 23:43:04 fir-md1-s1 kernel: 0d May 01 23:43:04 fir-md1-s1 kernel: 48 May 01 23:43:04 fir-md1-s1 kernel: 98 May 01 23:43:04 fir-md1-s1 kernel: 83 May 01 23:43:04 fir-md1-s1 kernel: e2 May 01 23:43:04 fir-md1-s1 kernel: 30 May 01 23:43:04 fir-md1-s1 kernel: 48 May 01 23:43:04 fir-md1-s1 kernel: 81 May 01 23:43:04 fir-md1-s1 kernel: c2 May 01 23:43:04 fir-md1-s1 kernel: 80 May 01 23:43:04 fir-md1-s1 kernel: b7 May 01 23:43:04 fir-md1-s1 kernel: 01 May 01 23:43:04 fir-md1-s1 kernel: 00 May 01 23:43:04 fir-md1-s1 kernel: 48 May 01 23:43:04 fir-md1-s1 kernel: 03 May 01 23:43:04 fir-md1-s1 kernel: 14 May 01 23:43:04 fir-md1-s1 kernel: c5 May 01 23:43:04 fir-md1-s1 kernel: 60 May 01 23:43:04 fir-md1-s1 kernel: b9 May 01 23:43:04 fir-md1-s1 kernel: b4 May 01 23:43:04 fir-md1-s1 kernel: b7 May 01 23:43:04 fir-md1-s1 kernel: 4c May 01 23:43:04 fir-md1-s1 kernel: 89 May 01 23:43:04 fir-md1-s1 kernel: 02 May 01 23:43:04 fir-md1-s1 kernel: 41 May 01 23:43:04 fir-md1-s1 kernel: 8b May 01 23:43:04 fir-md1-s1 kernel: 40 May 01 23:43:04 fir-md1-s1 kernel: 08 May 01 23:43:04 fir-md1-s1 kernel: 85 May 01 23:43:04 fir-md1-s1 kernel: c0 May 01 23:43:04 fir-md1-s1 kernel: 75 May 01 23:43:04 fir-md1-s1 kernel: 0f May 01 23:43:04 fir-md1-s1 kernel: 0f May 01 23:43:04 fir-md1-s1 kernel: 1f May 01 23:43:04 fir-md1-s1 kernel: 44 May 01 23:43:04 fir-md1-s1 kernel: 00 May 01 23:43:04 fir-md1-s1 kernel: 00 May 01 23:43:04 fir-md1-s1 kernel: f3 May 01 23:43:04 fir-md1-s1 kernel: 90 May 01 23:43:04 fir-md1-s1 kernel: <41> May 01 23:43:04 fir-md1-s1 kernel: 8b May 01 23:43:04 fir-md1-s1 kernel: 40 May 01 23:43:04 fir-md1-s1 kernel: 08 May 01 23:43:04 fir-md1-s1 kernel: 85 May 01 23:43:04 fir-md1-s1 kernel: c0 May 01 23:43:04 fir-md1-s1 kernel: 74 May 01 23:43:04 fir-md1-s1 kernel: f6 May 01 23:43:04 fir-md1-s1 kernel: 4d May 01 23:43:04 fir-md1-s1 kernel: 8b May 01 23:43:04 fir-md1-s1 kernel: 08 May 01 23:43:04 fir-md1-s1 kernel: 4d May 01 23:43:04 fir-md1-s1 kernel: 85 May 01 23:43:04 fir-md1-s1 kernel: c9 May 01 23:43:04 fir-md1-s1 kernel: 74 May 01 23:43:04 fir-md1-s1 kernel: 04 May 01 23:43:04 fir-md1-s1 kernel: 41 May 01 23:43:04 fir-md1-s1 kernel: 0f May 01 23:43:04 fir-md1-s1 kernel: 18 May 01 23:43:04 fir-md1-s1 kernel: 09 May 01 23:43:04 fir-md1-s1 kernel: 8b May 01 23:43:04 fir-md1-s1 kernel: May 01 23:43:04 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#25 stuck for 22s! [mdt_io01_082:103083] May 01 23:43:04 fir-md1-s1 kernel: Modules linked in: May 01 23:43:04 fir-md1-s1 kernel: osp(OE) May 01 23:43:04 fir-md1-s1 kernel: mdd(OE) May 01 23:43:04 fir-md1-s1 kernel: lod(OE) May 01 23:43:04 fir-md1-s1 kernel: mdt(OE) May 01 23:43:04 fir-md1-s1 kernel: lfsck(OE) May 01 23:43:04 fir-md1-s1 kernel: mgs(OE) May 01 23:43:04 fir-md1-s1 kernel: mgc(OE) May 01 23:43:04 fir-md1-s1 kernel: osd_ldiskfs(OE) May 01 23:43:04 fir-md1-s1 kernel: lquota(OE) May 01 23:43:04 fir-md1-s1 kernel: ldiskfs(OE) May 01 23:43:04 fir-md1-s1 kernel: lustre(OE) May 01 23:43:04 fir-md1-s1 kernel: lmv(OE) May 01 23:43:04 fir-md1-s1 kernel: mdc(OE) May 01 23:43:04 fir-md1-s1 kernel: osc(OE) May 01 23:43:04 fir-md1-s1 kernel: lov(OE) May 01 23:43:04 fir-md1-s1 kernel: fid(OE) May 01 23:43:04 fir-md1-s1 kernel: fld(OE) May 01 23:43:04 fir-md1-s1 kernel: ko2iblnd(OE) May 01 23:43:04 fir-md1-s1 kernel: ptlrpc(OE) May 01 23:43:04 fir-md1-s1 kernel: obdclass(OE) May 01 23:43:04 fir-md1-s1 kernel: lnet(OE) May 01 23:43:04 fir-md1-s1 kernel: libcfs(OE) May 01 23:43:04 fir-md1-s1 kernel: rpcsec_gss_krb5 May 01 23:43:04 fir-md1-s1 kernel: auth_rpcgss May 01 23:43:04 fir-md1-s1 kernel: nfsv4 May 01 23:43:04 fir-md1-s1 kernel: dns_resolver May 01 23:43:04 fir-md1-s1 kernel: nfs May 01 23:43:04 fir-md1-s1 kernel: lockd May 01 23:43:04 fir-md1-s1 kernel: grace May 01 23:43:04 fir-md1-s1 kernel: fscache May 01 23:43:04 fir-md1-s1 kernel: rdma_ucm(OE) May 01 23:43:04 fir-md1-s1 kernel: ib_ucm(OE) May 01 23:43:04 fir-md1-s1 kernel: rdma_cm(OE) May 01 23:43:04 fir-md1-s1 kernel: iw_cm(OE) May 01 23:43:04 fir-md1-s1 kernel: ib_ipoib(OE) May 01 23:43:04 fir-md1-s1 kernel: ib_cm(OE) May 01 23:43:04 fir-md1-s1 kernel: ib_umad(OE) May 01 23:43:04 fir-md1-s1 kernel: mlx5_fpga_tools(OE) May 01 23:43:04 fir-md1-s1 kernel: mlx4_en(OE) May 01 23:43:04 fir-md1-s1 kernel: mlx4_ib(OE) May 01 23:43:04 fir-md1-s1 kernel: mlx4_core(OE) May 01 23:43:04 fir-md1-s1 kernel: dell_rbu May 01 23:43:04 fir-md1-s1 kernel: sunrpc May 01 23:43:04 fir-md1-s1 kernel: vfat May 01 23:43:04 fir-md1-s1 kernel: fat May 01 23:43:04 fir-md1-s1 kernel: dm_round_robin May 01 23:43:04 fir-md1-s1 kernel: amd64_edac_mod May 01 23:43:04 fir-md1-s1 kernel: edac_mce_amd May 01 23:43:04 fir-md1-s1 kernel: kvm_amd May 01 23:43:04 fir-md1-s1 kernel: kvm May 01 23:43:04 fir-md1-s1 kernel: ses May 01 23:43:04 fir-md1-s1 kernel: irqbypass May 01 23:43:04 fir-md1-s1 kernel: crc32_pclmul May 01 23:43:04 fir-md1-s1 kernel: enclosure May 01 23:43:04 fir-md1-s1 kernel: ghash_clmulni_intel May 01 23:43:04 fir-md1-s1 kernel: dcdbas May 01 23:43:04 fir-md1-s1 kernel: aesni_intel May 01 23:43:04 fir-md1-s1 kernel: lrw May 01 23:43:04 fir-md1-s1 kernel: gf128mul May 01 23:43:04 fir-md1-s1 kernel: glue_helper May 01 23:43:04 fir-md1-s1 kernel: ablk_helper May 01 23:43:04 fir-md1-s1 kernel: cryptd May 01 23:43:04 fir-md1-s1 kernel: ipmi_si May 01 23:43:04 fir-md1-s1 kernel: pcspkr May 01 23:43:04 fir-md1-s1 kernel: ipmi_devintf May 01 23:43:04 fir-md1-s1 kernel: ccp May 01 23:43:04 fir-md1-s1 kernel: i2c_piix4 May 01 23:43:04 fir-md1-s1 kernel: dm_multipath May 01 23:43:04 fir-md1-s1 kernel: sg May 01 23:43:04 fir-md1-s1 kernel: k10temp May 01 23:43:04 fir-md1-s1 kernel: ipmi_msghandler May 01 23:43:04 fir-md1-s1 kernel: dm_mod May 01 23:43:04 fir-md1-s1 kernel: acpi_power_meter May 01 23:43:04 fir-md1-s1 kernel: knem(OE) May 01 23:43:04 fir-md1-s1 kernel: ip_tables May 01 23:43:04 fir-md1-s1 kernel: ext4 May 01 23:43:04 fir-md1-s1 kernel: mbcache May 01 23:43:04 fir-md1-s1 kernel: jbd2 May 01 23:43:04 fir-md1-s1 kernel: sd_mod May 01 23:43:04 fir-md1-s1 kernel: crc_t10dif May 01 23:43:04 fir-md1-s1 kernel: crct10dif_generic May 01 23:43:04 fir-md1-s1 kernel: mlx5_ib(OE) May 01 23:43:04 fir-md1-s1 kernel: ib_uverbs(OE) May 01 23:43:04 fir-md1-s1 kernel: ib_core(OE) May 01 23:43:04 fir-md1-s1 kernel: i2c_algo_bit May 01 23:43:04 fir-md1-s1 kernel: drm_kms_helper May 01 23:43:04 fir-md1-s1 kernel: mlx5_core(OE) May 01 23:43:04 fir-md1-s1 kernel: syscopyarea May 01 23:43:04 fir-md1-s1 kernel: sysfillrect May 01 23:43:04 fir-md1-s1 kernel: sysimgblt May 01 23:43:04 fir-md1-s1 kernel: fb_sys_fops May 01 23:43:04 fir-md1-s1 kernel: mlxfw(OE) May 01 23:43:04 fir-md1-s1 kernel: crct10dif_pclmul May 01 23:43:04 fir-md1-s1 kernel: ttm May 01 23:43:04 fir-md1-s1 kernel: devlink May 01 23:43:04 fir-md1-s1 kernel: ahci May 01 23:43:04 fir-md1-s1 kernel: crct10dif_common May 01 23:43:04 fir-md1-s1 kernel: libahci May 01 23:43:04 fir-md1-s1 kernel: drm May 01 23:43:04 fir-md1-s1 kernel: mlx_compat(OE) May 01 23:43:04 fir-md1-s1 kernel: tg3 May 01 23:43:04 fir-md1-s1 kernel: crc32c_intel May 01 23:43:04 fir-md1-s1 kernel: libata May 01 23:43:04 fir-md1-s1 kernel: megaraid_sas May 01 23:43:04 fir-md1-s1 kernel: drm_panel_orientation_quirks May 01 23:43:04 fir-md1-s1 kernel: ptp May 01 23:43:04 fir-md1-s1 kernel: pps_core May 01 23:43:04 fir-md1-s1 kernel: mpt3sas(OE) May 01 23:43:04 fir-md1-s1 kernel: raid_class May 01 23:43:04 fir-md1-s1 kernel: scsi_transport_sas May 01 23:43:04 fir-md1-s1 kernel: [last unloaded: libcfs] May 01 23:43:04 fir-md1-s1 kernel: May 01 23:43:04 fir-md1-s1 kernel: CPU: 25 PID: 103083 Comm: mdt_io01_082 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 May 01 23:43:04 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 May 01 23:43:04 fir-md1-s1 kernel: task: ffff985cfe905140 ti: ffff985ccaf18000 task.ti: ffff985ccaf18000 May 01 23:43:04 fir-md1-s1 kernel: RIP: 0010:[] May 01 23:43:04 fir-md1-s1 kernel: [] native_queued_spin_lock_slowpath+0x126/0x200 May 01 23:43:04 fir-md1-s1 kernel: RSP: 0018:ffff985ccaf1b800 EFLAGS: 00000246 May 01 23:43:04 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffff9831703ceb60 RCX: 0000000000c90000 May 01 23:43:04 fir-md1-s1 kernel: RDX: ffff984cff71b780 RSI: 0000000000910101 RDI: ffff982c9fc8c480 May 01 23:43:04 fir-md1-s1 kernel: RBP: ffff985ccaf1b800 R08: ffff983cff79b780 R09: 0000000000000000 May 01 23:43:04 fir-md1-s1 kernel: R10: ffff983cff79f140 R11: ffffde3fa5607800 R12: 0000000000000000 May 01 23:43:04 fir-md1-s1 kernel: R13: ffff985ccaf1b7a0 R14: ffff9831703ce8d0 R15: 0000000000000000 May 01 23:43:04 fir-md1-s1 kernel: FS: 00007f427f792740(0000) GS:ffff983cff780000(0000) knlGS:0000000000000000 May 01 23:43:04 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 01 23:43:04 fir-md1-s1 kernel: CR2: 00007f427f58b000 CR3: 00000012f7610000 CR4: 00000000003407e0 May 01 23:43:04 fir-md1-s1 kernel: Call Trace: May 01 23:43:04 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf May 01 23:43:04 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 May 01 23:43:04 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] May 01 23:43:04 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] May 01 23:43:04 fir-md1-s1 kernel: [] ? ktime_get+0x52/0xe0 May 01 23:43:04 fir-md1-s1 kernel: [] ? kiblnd_check_sends_locked+0xa72/0xe40 [ko2iblnd] May 01 23:43:04 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] May 01 23:43:04 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 May 01 23:43:04 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] May 01 23:43:04 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] May 01 23:43:04 fir-md1-s1 kernel: [] mdt_obd_preprw+0x637/0x1060 [mdt] May 01 23:43:04 fir-md1-s1 kernel: [] tgt_brw_write+0xc7e/0x1a90 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? lustre_msg_buf_v2+0x1b0/0x1b0 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? lustre_msg_buf+0x17/0x60 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? update_curr+0x14c/0x1e0 May 01 23:43:04 fir-md1-s1 kernel: [] ? account_entity_dequeue+0xae/0xd0 May 01 23:43:04 fir-md1-s1 kernel: [] ? __enqueue_entity+0x78/0x80 May 01 23:43:04 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] May 01 23:43:04 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? default_wake_function+0x12/0x20 May 01 23:43:04 fir-md1-s1 kernel: [] ? __wake_up_common+0x5b/0x90 May 01 23:43:04 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 May 01 23:43:04 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:43:04 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 May 01 23:43:04 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:43:04 fir-md1-s1 kernel: Code: May 01 23:43:04 fir-md1-s1 kernel: 0d May 01 23:43:04 fir-md1-s1 kernel: 48 May 01 23:43:04 fir-md1-s1 kernel: 98 May 01 23:43:04 fir-md1-s1 kernel: 83 May 01 23:43:04 fir-md1-s1 kernel: e2 May 01 23:43:04 fir-md1-s1 kernel: 30 May 01 23:43:04 fir-md1-s1 kernel: 48 May 01 23:43:04 fir-md1-s1 kernel: 81 May 01 23:43:04 fir-md1-s1 kernel: c2 May 01 23:43:04 fir-md1-s1 kernel: 80 May 01 23:43:04 fir-md1-s1 kernel: b7 May 01 23:43:04 fir-md1-s1 kernel: 01 May 01 23:43:04 fir-md1-s1 kernel: 00 May 01 23:43:04 fir-md1-s1 kernel: 48 May 01 23:43:04 fir-md1-s1 kernel: 03 May 01 23:43:04 fir-md1-s1 kernel: 14 May 01 23:43:04 fir-md1-s1 kernel: c5 May 01 23:43:04 fir-md1-s1 kernel: 60 May 01 23:43:04 fir-md1-s1 kernel: b9 May 01 23:43:04 fir-md1-s1 kernel: b4 May 01 23:43:04 fir-md1-s1 kernel: b7 May 01 23:43:04 fir-md1-s1 kernel: 4c May 01 23:43:04 fir-md1-s1 kernel: 89 May 01 23:43:04 fir-md1-s1 kernel: 02 May 01 23:43:04 fir-md1-s1 kernel: 41 May 01 23:43:04 fir-md1-s1 kernel: 8b May 01 23:43:04 fir-md1-s1 kernel: 40 May 01 23:43:04 fir-md1-s1 kernel: 08 May 01 23:43:04 fir-md1-s1 kernel: 85 May 01 23:43:04 fir-md1-s1 kernel: c0 May 01 23:43:04 fir-md1-s1 kernel: 75 May 01 23:43:04 fir-md1-s1 kernel: 0f May 01 23:43:04 fir-md1-s1 kernel: 0f May 01 23:43:04 fir-md1-s1 kernel: 1f May 01 23:43:04 fir-md1-s1 kernel: 44 May 01 23:43:04 fir-md1-s1 kernel: 00 May 01 23:43:04 fir-md1-s1 kernel: 00 May 01 23:43:04 fir-md1-s1 kernel: f3 May 01 23:43:04 fir-md1-s1 kernel: 90 May 01 23:43:04 fir-md1-s1 kernel: 41 May 01 23:43:04 fir-md1-s1 kernel: 8b May 01 23:43:04 fir-md1-s1 kernel: 40 May 01 23:43:04 fir-md1-s1 kernel: 08 May 01 23:43:04 fir-md1-s1 kernel: <85> May 01 23:43:04 fir-md1-s1 kernel: c0 May 01 23:43:04 fir-md1-s1 kernel: 74 May 01 23:43:04 fir-md1-s1 kernel: f6 May 01 23:43:04 fir-md1-s1 kernel: 4d May 01 23:43:04 fir-md1-s1 kernel: 8b May 01 23:43:04 fir-md1-s1 kernel: 08 May 01 23:43:04 fir-md1-s1 kernel: 4d May 01 23:43:04 fir-md1-s1 kernel: 85 May 01 23:43:04 fir-md1-s1 kernel: c9 May 01 23:43:04 fir-md1-s1 kernel: 74 May 01 23:43:04 fir-md1-s1 kernel: 04 May 01 23:43:04 fir-md1-s1 kernel: 41 May 01 23:43:04 fir-md1-s1 kernel: 0f May 01 23:43:04 fir-md1-s1 kernel: 18 May 01 23:43:04 fir-md1-s1 kernel: 09 May 01 23:43:04 fir-md1-s1 kernel: 8b May 01 23:43:04 fir-md1-s1 kernel: 17 May 01 23:43:04 fir-md1-s1 kernel: 0f May 01 23:43:04 fir-md1-s1 kernel: b7 May 01 23:43:04 fir-md1-s1 kernel: c2 May 01 23:43:04 fir-md1-s1 kernel: May 01 23:43:04 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#34 stuck for 22s! [mdt_io02_034:102984] May 01 23:43:04 fir-md1-s1 kernel: Modules linked in: May 01 23:43:04 fir-md1-s1 kernel: osp(OE) May 01 23:43:04 fir-md1-s1 kernel: mdd(OE) May 01 23:43:04 fir-md1-s1 kernel: lod(OE) May 01 23:43:04 fir-md1-s1 kernel: mdt(OE) May 01 23:43:04 fir-md1-s1 kernel: lfsck(OE) May 01 23:43:04 fir-md1-s1 kernel: mgs(OE) May 01 23:43:04 fir-md1-s1 kernel: mgc(OE) May 01 23:43:04 fir-md1-s1 kernel: osd_ldiskfs(OE) May 01 23:43:04 fir-md1-s1 kernel: lquota(OE) May 01 23:43:04 fir-md1-s1 kernel: ldiskfs(OE) May 01 23:43:04 fir-md1-s1 kernel: lustre(OE) May 01 23:43:04 fir-md1-s1 kernel: lmv(OE) May 01 23:43:04 fir-md1-s1 kernel: mdc(OE) May 01 23:43:04 fir-md1-s1 kernel: osc(OE) May 01 23:43:04 fir-md1-s1 kernel: lov(OE) May 01 23:43:04 fir-md1-s1 kernel: fid(OE) May 01 23:43:04 fir-md1-s1 kernel: fld(OE) May 01 23:43:04 fir-md1-s1 kernel: ko2iblnd(OE) May 01 23:43:04 fir-md1-s1 kernel: ptlrpc(OE) May 01 23:43:04 fir-md1-s1 kernel: obdclass(OE) May 01 23:43:04 fir-md1-s1 kernel: lnet(OE) May 01 23:43:04 fir-md1-s1 kernel: libcfs(OE) May 01 23:43:04 fir-md1-s1 kernel: rpcsec_gss_krb5 May 01 23:43:04 fir-md1-s1 kernel: auth_rpcgss May 01 23:43:04 fir-md1-s1 kernel: nfsv4 May 01 23:43:04 fir-md1-s1 kernel: dns_resolver May 01 23:43:04 fir-md1-s1 kernel: nfs May 01 23:43:04 fir-md1-s1 kernel: lockd May 01 23:43:04 fir-md1-s1 kernel: grace May 01 23:43:04 fir-md1-s1 kernel: fscache May 01 23:43:04 fir-md1-s1 kernel: rdma_ucm(OE) May 01 23:43:04 fir-md1-s1 kernel: ib_ucm(OE) May 01 23:43:04 fir-md1-s1 kernel: rdma_cm(OE) May 01 23:43:04 fir-md1-s1 kernel: iw_cm(OE) May 01 23:43:04 fir-md1-s1 kernel: ib_ipoib(OE) May 01 23:43:04 fir-md1-s1 kernel: ib_cm(OE) May 01 23:43:04 fir-md1-s1 kernel: ib_umad(OE) May 01 23:43:04 fir-md1-s1 kernel: mlx5_fpga_tools(OE) May 01 23:43:04 fir-md1-s1 kernel: mlx4_en(OE) May 01 23:43:04 fir-md1-s1 kernel: mlx4_ib(OE) May 01 23:43:04 fir-md1-s1 kernel: mlx4_core(OE) May 01 23:43:04 fir-md1-s1 kernel: dell_rbu May 01 23:43:04 fir-md1-s1 kernel: sunrpc May 01 23:43:04 fir-md1-s1 kernel: vfat May 01 23:43:04 fir-md1-s1 kernel: fat May 01 23:43:04 fir-md1-s1 kernel: dm_round_robin May 01 23:43:04 fir-md1-s1 kernel: amd64_edac_mod May 01 23:43:04 fir-md1-s1 kernel: edac_mce_amd May 01 23:43:04 fir-md1-s1 kernel: kvm_amd May 01 23:43:04 fir-md1-s1 kernel: kvm May 01 23:43:04 fir-md1-s1 kernel: ses May 01 23:43:04 fir-md1-s1 kernel: irqbypass May 01 23:43:04 fir-md1-s1 kernel: crc32_pclmul May 01 23:43:04 fir-md1-s1 kernel: enclosure May 01 23:43:04 fir-md1-s1 kernel: ghash_clmulni_intel May 01 23:43:04 fir-md1-s1 kernel: dcdbas May 01 23:43:04 fir-md1-s1 kernel: aesni_intel May 01 23:43:04 fir-md1-s1 kernel: lrw May 01 23:43:04 fir-md1-s1 kernel: gf128mul May 01 23:43:04 fir-md1-s1 kernel: glue_helper May 01 23:43:04 fir-md1-s1 kernel: ablk_helper May 01 23:43:04 fir-md1-s1 kernel: cryptd May 01 23:43:04 fir-md1-s1 kernel: ipmi_si May 01 23:43:04 fir-md1-s1 kernel: pcspkr May 01 23:43:04 fir-md1-s1 kernel: ipmi_devintf May 01 23:43:04 fir-md1-s1 kernel: ccp May 01 23:43:04 fir-md1-s1 kernel: i2c_piix4 May 01 23:43:04 fir-md1-s1 kernel: dm_multipath May 01 23:43:04 fir-md1-s1 kernel: sg May 01 23:43:04 fir-md1-s1 kernel: k10temp May 01 23:43:04 fir-md1-s1 kernel: ipmi_msghandler May 01 23:43:04 fir-md1-s1 kernel: dm_mod May 01 23:43:04 fir-md1-s1 kernel: acpi_power_meter May 01 23:43:04 fir-md1-s1 kernel: knem(OE) May 01 23:43:04 fir-md1-s1 kernel: ip_tables May 01 23:43:04 fir-md1-s1 kernel: ext4 May 01 23:43:04 fir-md1-s1 kernel: mbcache May 01 23:43:04 fir-md1-s1 kernel: jbd2 May 01 23:43:04 fir-md1-s1 kernel: sd_mod May 01 23:43:04 fir-md1-s1 kernel: crc_t10dif May 01 23:43:04 fir-md1-s1 kernel: crct10dif_generic May 01 23:43:04 fir-md1-s1 kernel: mlx5_ib(OE) May 01 23:43:04 fir-md1-s1 kernel: ib_uverbs(OE) May 01 23:43:04 fir-md1-s1 kernel: ib_core(OE) May 01 23:43:04 fir-md1-s1 kernel: i2c_algo_bit May 01 23:43:04 fir-md1-s1 kernel: drm_kms_helper May 01 23:43:04 fir-md1-s1 kernel: mlx5_core(OE) May 01 23:43:04 fir-md1-s1 kernel: syscopyarea May 01 23:43:04 fir-md1-s1 kernel: sysfillrect May 01 23:43:04 fir-md1-s1 kernel: sysimgblt May 01 23:43:04 fir-md1-s1 kernel: fb_sys_fops May 01 23:43:04 fir-md1-s1 kernel: mlxfw(OE) May 01 23:43:04 fir-md1-s1 kernel: crct10dif_pclmul May 01 23:43:04 fir-md1-s1 kernel: ttm May 01 23:43:04 fir-md1-s1 kernel: devlink May 01 23:43:04 fir-md1-s1 kernel: ahci May 01 23:43:04 fir-md1-s1 kernel: crct10dif_common May 01 23:43:04 fir-md1-s1 kernel: libahci May 01 23:43:04 fir-md1-s1 kernel: drm May 01 23:43:04 fir-md1-s1 kernel: mlx_compat(OE) May 01 23:43:04 fir-md1-s1 kernel: tg3 May 01 23:43:04 fir-md1-s1 kernel: crc32c_intel May 01 23:43:04 fir-md1-s1 kernel: libata May 01 23:43:04 fir-md1-s1 kernel: megaraid_sas May 01 23:43:04 fir-md1-s1 kernel: drm_panel_orientation_quirks May 01 23:43:04 fir-md1-s1 kernel: ptp May 01 23:43:04 fir-md1-s1 kernel: pps_core May 01 23:43:04 fir-md1-s1 kernel: mpt3sas(OE) May 01 23:43:04 fir-md1-s1 kernel: raid_class May 01 23:43:04 fir-md1-s1 kernel: scsi_transport_sas May 01 23:43:04 fir-md1-s1 kernel: [last unloaded: libcfs] May 01 23:43:04 fir-md1-s1 kernel: May 01 23:43:04 fir-md1-s1 kernel: CPU: 34 PID: 102984 Comm: mdt_io02_034 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 May 01 23:43:04 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 May 01 23:43:04 fir-md1-s1 kernel: task: ffff982cf9de4100 ti: ffff985ce80d4000 task.ti: ffff985ce80d4000 May 01 23:43:04 fir-md1-s1 kernel: RIP: 0010:[] May 01 23:43:04 fir-md1-s1 kernel: [] native_queued_spin_lock_slowpath+0x122/0x200 May 01 23:43:04 fir-md1-s1 kernel: RSP: 0018:ffff985ce80d7800 EFLAGS: 00000246 May 01 23:43:04 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffff983165105698 RCX: 0000000001110000 May 01 23:43:04 fir-md1-s1 kernel: RDX: ffff983cff79b780 RSI: 0000000000c90101 RDI: ffff982c9fc8c480 May 01 23:43:04 fir-md1-s1 kernel: RBP: ffff985ce80d7800 R08: ffff984cff81b780 R09: 0000000000000000 May 01 23:43:04 fir-md1-s1 kernel: R10: ffff984cff81f140 R11: ffffde3fa7770c00 R12: 0000000000000000 May 01 23:43:04 fir-md1-s1 kernel: R13: ffff985ce80d77a0 R14: ffff983165105408 R15: 0000000000000000 May 01 23:43:04 fir-md1-s1 kernel: FS: 00007fe19c902740(0000) GS:ffff984cff800000(0000) knlGS:0000000000000000 May 01 23:43:04 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 01 23:43:04 fir-md1-s1 kernel: CR2: 00007fe19bb327c0 CR3: 00000012f7610000 CR4: 00000000003407e0 May 01 23:43:04 fir-md1-s1 kernel: Call Trace: May 01 23:43:04 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf May 01 23:43:04 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 May 01 23:43:04 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] May 01 23:43:04 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] May 01 23:43:04 fir-md1-s1 kernel: [] ? ktime_get+0x52/0xe0 May 01 23:43:04 fir-md1-s1 kernel: [] ? kiblnd_check_sends_locked+0xa72/0xe40 [ko2iblnd] May 01 23:43:04 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] May 01 23:43:04 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 May 01 23:43:04 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] May 01 23:43:04 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] May 01 23:43:04 fir-md1-s1 kernel: [] mdt_obd_preprw+0x637/0x1060 [mdt] May 01 23:43:04 fir-md1-s1 kernel: [] tgt_brw_write+0xc7e/0x1a90 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? tgt_free_reply_data+0x128/0x3b0 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? kfree+0x106/0x140 May 01 23:43:04 fir-md1-s1 kernel: [] ? tgt_free_reply_data+0x128/0x3b0 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] May 01 23:43:04 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? default_wake_function+0x12/0x20 May 01 23:43:04 fir-md1-s1 kernel: [] ? __wake_up_common+0x5b/0x90 May 01 23:43:04 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 May 01 23:43:04 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:43:04 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 May 01 23:43:04 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:43:04 fir-md1-s1 kernel: Code: May 01 23:43:04 fir-md1-s1 kernel: 13 May 01 23:43:04 fir-md1-s1 kernel: 48 May 01 23:43:04 fir-md1-s1 kernel: c1 May 01 23:43:04 fir-md1-s1 kernel: ea May 01 23:43:04 fir-md1-s1 kernel: 0d May 01 23:43:04 fir-md1-s1 kernel: 48 May 01 23:43:04 fir-md1-s1 kernel: 98 May 01 23:43:04 fir-md1-s1 kernel: 83 May 01 23:43:04 fir-md1-s1 kernel: e2 May 01 23:43:04 fir-md1-s1 kernel: 30 May 01 23:43:04 fir-md1-s1 kernel: 48 May 01 23:43:04 fir-md1-s1 kernel: 81 May 01 23:43:04 fir-md1-s1 kernel: c2 May 01 23:43:04 fir-md1-s1 kernel: 80 May 01 23:43:04 fir-md1-s1 kernel: b7 May 01 23:43:04 fir-md1-s1 kernel: 01 May 01 23:43:04 fir-md1-s1 kernel: 00 May 01 23:43:04 fir-md1-s1 kernel: 48 May 01 23:43:04 fir-md1-s1 kernel: 03 May 01 23:43:04 fir-md1-s1 kernel: 14 May 01 23:43:04 fir-md1-s1 kernel: c5 May 01 23:43:04 fir-md1-s1 kernel: 60 May 01 23:43:04 fir-md1-s1 kernel: b9 May 01 23:43:04 fir-md1-s1 kernel: b4 May 01 23:43:04 fir-md1-s1 kernel: b7 May 01 23:43:04 fir-md1-s1 kernel: 4c May 01 23:43:04 fir-md1-s1 kernel: 89 May 01 23:43:04 fir-md1-s1 kernel: 02 May 01 23:43:04 fir-md1-s1 kernel: 41 May 01 23:43:04 fir-md1-s1 kernel: 8b May 01 23:43:04 fir-md1-s1 kernel: 40 May 01 23:43:04 fir-md1-s1 kernel: 08 May 01 23:43:04 fir-md1-s1 kernel: 85 May 01 23:43:04 fir-md1-s1 kernel: c0 May 01 23:43:04 fir-md1-s1 kernel: 75 May 01 23:43:04 fir-md1-s1 kernel: 0f May 01 23:43:04 fir-md1-s1 kernel: 0f May 01 23:43:04 fir-md1-s1 kernel: 1f May 01 23:43:04 fir-md1-s1 kernel: 44 May 01 23:43:04 fir-md1-s1 kernel: 00 May 01 23:43:04 fir-md1-s1 kernel: 00 May 01 23:43:04 fir-md1-s1 kernel: f3 May 01 23:43:04 fir-md1-s1 kernel: 90 May 01 23:43:04 fir-md1-s1 kernel: <41> May 01 23:43:04 fir-md1-s1 kernel: 8b May 01 23:43:04 fir-md1-s1 kernel: 40 May 01 23:43:04 fir-md1-s1 kernel: 08 May 01 23:43:04 fir-md1-s1 kernel: 85 May 01 23:43:04 fir-md1-s1 kernel: c0 May 01 23:43:04 fir-md1-s1 kernel: 74 May 01 23:43:04 fir-md1-s1 kernel: f6 May 01 23:43:04 fir-md1-s1 kernel: 4d May 01 23:43:04 fir-md1-s1 kernel: 8b May 01 23:43:04 fir-md1-s1 kernel: 08 May 01 23:43:04 fir-md1-s1 kernel: 4d May 01 23:43:04 fir-md1-s1 kernel: 85 May 01 23:43:04 fir-md1-s1 kernel: c9 May 01 23:43:04 fir-md1-s1 kernel: 74 May 01 23:43:04 fir-md1-s1 kernel: 04 May 01 23:43:04 fir-md1-s1 kernel: 41 May 01 23:43:04 fir-md1-s1 kernel: 0f May 01 23:43:04 fir-md1-s1 kernel: 18 May 01 23:43:04 fir-md1-s1 kernel: 09 May 01 23:43:04 fir-md1-s1 kernel: 8b May 01 23:43:04 fir-md1-s1 kernel: May 01 23:43:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Received new LWP connection from 10.0.10.52@o2ib7, removing former export from same NID May 01 23:43:04 fir-md1-s1 kernel: mlx5_fpga_tools(OE) May 01 23:43:04 fir-md1-s1 kernel: mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm ses irqbypass crc32_pclmul enclosure ghash_clmulni_intel dcdbas aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ipmi_si pcspkr ipmi_devintf ccp i2c_piix4 dm_multipath sg k10temp ipmi_msghandler dm_mod acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper mlx5_core(OE) syscopyarea sysfillrect sysimgblt fb_sys_fops mlxfw(OE) crct10dif_pclmul ttm devlink ahci crct10dif_common libahci drm mlx_compat(OE) tg3 crc32c_intel libata megaraid_sas drm_panel_orientation_quirks ptp pps_core mpt3sas(OE) raid_class scsi_transport_sas [last unloaded: libcfs] May 01 23:43:04 fir-md1-s1 kernel: CPU: 4 PID: 103101 Comm: mdt_io00_057 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 May 01 23:43:04 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 May 01 23:43:04 fir-md1-s1 kernel: task: ffff985c827130c0 ti: ffff985c1a30c000 task.ti: ffff985c1a30c000 May 01 23:43:04 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x1d6/0x200 May 01 23:43:04 fir-md1-s1 kernel: RSP: 0018:ffff985c1a30f800 EFLAGS: 00000293 May 01 23:43:04 fir-md1-s1 kernel: RAX: 0000000000000001 RBX: ffff9831703ce738 RCX: 0000000000000001 May 01 23:43:04 fir-md1-s1 kernel: RDX: 0000000000000101 RSI: 0000000000000001 RDI: ffff982c9fc8c480 May 01 23:43:04 fir-md1-s1 kernel: RBP: ffff985c1a30f800 R08: 0000000000000101 R09: ffffffffc1231d1a May 01 23:43:04 fir-md1-s1 kernel: R10: ffff982cfee5f140 R11: ffffde3ed5b1ce00 R12: 0000000000000000 May 01 23:43:04 fir-md1-s1 kernel: R13: ffff985c1a30f7a0 R14: ffff9831703ce4a8 R15: 0000000000000000 May 01 23:43:04 fir-md1-s1 kernel: FS: 00007f427f792740(0000) GS:ffff982cfee40000(0000) knlGS:0000000000000000 May 01 23:43:04 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 01 23:43:04 fir-md1-s1 kernel: CR2: 00007f427f58b000 CR3: 00000012f7610000 CR4: 00000000003407e0 May 01 23:43:04 fir-md1-s1 kernel: Call Trace: May 01 23:43:04 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf May 01 23:43:04 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 May 01 23:43:04 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] May 01 23:43:04 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] May 01 23:43:04 fir-md1-s1 kernel: [] ? ktime_get+0x52/0xe0 May 01 23:43:04 fir-md1-s1 kernel: [] ? kiblnd_check_sends_locked+0xa72/0xe40 [ko2iblnd] May 01 23:43:04 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] May 01 23:43:04 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 May 01 23:43:04 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] May 01 23:43:04 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] May 01 23:43:04 fir-md1-s1 kernel: [] mdt_obd_preprw+0x637/0x1060 [mdt] May 01 23:43:04 fir-md1-s1 kernel: [] tgt_brw_write+0xc7e/0x1a90 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? lustre_msg_buf_v2+0x1b0/0x1b0 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? lustre_msg_buf+0x17/0x60 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? update_curr+0x14c/0x1e0 May 01 23:43:04 fir-md1-s1 kernel: [] ? account_entity_dequeue+0xae/0xd0 May 01 23:43:04 fir-md1-s1 kernel: [] ? __enqueue_entity+0x78/0x80 May 01 23:43:04 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] May 01 23:43:04 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? default_wake_function+0x12/0x20 May 01 23:43:04 fir-md1-s1 kernel: [] ? __wake_up_common+0x5b/0x90 May 01 23:43:04 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] May 01 23:43:04 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 May 01 23:43:04 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:43:04 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 May 01 23:43:04 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:43:04 fir-md1-s1 kernel: Code: f4 e9 93 fe ff ff 0f 1f 80 00 00 00 00 83 fa 01 75 11 0f 1f 00 e9 68 fe ff ff 0f 1f 00 85 c0 74 0c f3 90 8b 07 0f b6 c0 83 f8 03 <75> f0 b8 01 00 00 00 66 89 07 5d c3 66 0f 1f 44 00 00 f3 90 4d May 01 23:43:05 fir-md1-s1 kernel: Lustre: 101395:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556779364/real 1556779364] req@ffff98555c27e600 x1632254604128256/t0(0) o601->fir-MDT0000-lwp-MDT0002@0@lo:23/10 lens 336/336 e 1 to 1 dl 1556779385 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 May 01 23:43:05 fir-md1-s1 kernel: Lustre: 101395:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 01 23:43:07 fir-md1-s1 kernel: INFO: rcu_sched self-detected stall on CPU May 01 23:43:07 fir-md1-s1 kernel: INFO: rcu_sched detected stalls on CPUs/tasks: May 01 23:43:07 fir-md1-s1 kernel: { May 01 23:43:07 fir-md1-s1 kernel: 16 May 01 23:43:07 fir-md1-s1 kernel: } May 01 23:43:07 fir-md1-s1 kernel: (detected by 9, t=60002 jiffies, g=60007355, c=60007354, q=340009) May 01 23:43:07 fir-md1-s1 kernel: Task dump for CPU 16: May 01 23:43:07 fir-md1-s1 kernel: mdt00_018 R May 01 23:43:07 fir-md1-s1 kernel: running task May 01 23:43:07 fir-md1-s1 kernel: 0 102388 2 0x00000088 May 01 23:43:07 fir-md1-s1 kernel: Call Trace: May 01 23:43:07 fir-md1-s1 kernel: [] ? ldiskfs_es_shrink+0xb4/0x130 [ldiskfs] May 01 23:43:07 fir-md1-s1 kernel: [] ? shrink_slab+0x175/0x340 May 01 23:43:07 fir-md1-s1 kernel: [] ? zone_watermark_ok+0x1f/0x30 May 01 23:43:07 fir-md1-s1 kernel: [] ? compaction_suitable+0xa3/0xb0 May 01 23:43:07 fir-md1-s1 kernel: [] ? zone_reclaim+0x1d1/0x2f0 May 01 23:43:07 fir-md1-s1 kernel: [] ? get_page_from_freelist+0x87b/0xa70 May 01 23:43:07 fir-md1-s1 kernel: [] ? __getblk+0x2d/0x300 May 01 23:43:07 fir-md1-s1 kernel: [] ? __alloc_pages_nodemask+0x176/0x420 May 01 23:43:07 fir-md1-s1 kernel: [] ? alloc_pages_current+0x98/0x110 May 01 23:43:07 fir-md1-s1 kernel: [] ? new_slab+0x2c5/0x390 May 01 23:43:07 fir-md1-s1 kernel: [] ? ___slab_alloc+0x3ac/0x4f0 May 01 23:43:07 fir-md1-s1 kernel: [] ? osp_object_alloc+0x40/0x170 [osp] May 01 23:43:07 fir-md1-s1 kernel: [] ? fld_cache_lookup+0x36/0x1a0 [fld] May 01 23:43:07 fir-md1-s1 kernel: [] ? fld_local_lookup+0x62/0x270 [fld] May 01 23:43:07 fir-md1-s1 kernel: [] ? osp_object_alloc+0x40/0x170 [osp] May 01 23:43:07 fir-md1-s1 kernel: [] ? __slab_alloc+0x40/0x5c May 01 23:43:07 fir-md1-s1 kernel: [] ? kmem_cache_alloc+0x19b/0x1f0 May 01 23:43:07 fir-md1-s1 kernel: [] ? osp_object_alloc+0x40/0x170 [osp] May 01 23:43:07 fir-md1-s1 kernel: [] ? osp_object_alloc+0x40/0x170 [osp] May 01 23:43:07 fir-md1-s1 kernel: [] ? lod_object_init+0x1e7/0x3c0 [lod] May 01 23:43:07 fir-md1-s1 kernel: [] ? lu_object_alloc+0xe5/0x320 [obdclass] May 01 23:43:07 fir-md1-s1 kernel: [] ? lu_object_find_at+0x76/0x280 [obdclass] May 01 23:43:07 fir-md1-s1 kernel: [] ? lu_object_find_slice+0x1f/0x90 [obdclass] May 01 23:43:07 fir-md1-s1 kernel: [] ? mdd_object_find+0x10/0x70 [mdd] May 01 23:43:07 fir-md1-s1 kernel: [] ? obf_lookup+0x2c9/0x350 [mdd] May 01 23:43:07 fir-md1-s1 kernel: [] ? req_capsule_get_size+0x31/0x70 [ptlrpc] May 01 23:43:07 fir-md1-s1 kernel: [] ? mdt_getattr_name_lock+0xf7c/0x1c30 [mdt] May 01 23:43:07 fir-md1-s1 kernel: [] ? lustre_msg_buf+0x17/0x60 [ptlrpc] May 01 23:43:07 fir-md1-s1 kernel: [] ? __req_capsule_get+0x15f/0x740 [ptlrpc] May 01 23:43:07 fir-md1-s1 kernel: [] ? lustre_msg_get_flags+0x2c/0xa0 [ptlrpc] May 01 23:43:07 fir-md1-s1 kernel: [] ? mdt_intent_getattr+0x2b5/0x480 [mdt] May 01 23:43:07 fir-md1-s1 kernel: [] ? mdt_intent_policy+0x2e8/0xd00 [mdt] May 01 23:43:07 fir-md1-s1 kernel: [] ? mdt_intent_layout+0xcc0/0xcc0 [mdt] May 01 23:43:07 fir-md1-s1 kernel: [] ? ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] May 01 23:43:07 fir-md1-s1 kernel: [] ? cfs_hash_bd_add_locked+0x63/0x80 [libcfs] May 01 23:43:07 fir-md1-s1 kernel: [] ? cfs_hash_add+0xbe/0x1a0 [libcfs] May 01 23:43:07 fir-md1-s1 kernel: [] ? ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] May 01 23:43:07 fir-md1-s1 kernel: [] ? lustre_swab_ldlm_lock_desc+0x30/0x30 [ptlrpc] May 01 23:43:07 fir-md1-s1 kernel: [] ? tgt_enqueue+0x62/0x210 [ptlrpc] May 01 23:43:07 fir-md1-s1 kernel: [] ? tgt_request_handle+0xaea/0x1580 [ptlrpc] May 01 23:43:07 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] May 01 23:43:07 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] May 01 23:43:07 fir-md1-s1 kernel: [] ? ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 01 23:43:07 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] May 01 23:43:07 fir-md1-s1 kernel: [] ? default_wake_function+0x12/0x20 May 01 23:43:07 fir-md1-s1 kernel: [] ? __wake_up_common+0x5b/0x90 May 01 23:43:07 fir-md1-s1 kernel: [] ? ptlrpc_main+0xafc/0x1fc0 [ptlrpc] May 01 23:43:07 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] May 01 23:43:07 fir-md1-s1 kernel: [] ? kthread+0xd1/0xe0 May 01 23:43:07 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:43:07 fir-md1-s1 kernel: [] ? ret_from_fork_nospec_begin+0xe/0x21 May 01 23:43:07 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:43:07 fir-md1-s1 kernel: { May 01 23:43:07 fir-md1-s1 kernel: 16} (t=60374 jiffies g=60007355 c=60007354 q=340300) May 01 23:43:07 fir-md1-s1 kernel: Task dump for CPU 16: May 01 23:43:07 fir-md1-s1 kernel: mdt00_018 R running task 0 102388 2 0x00000088 May 01 23:43:07 fir-md1-s1 kernel: Call Trace: May 01 23:43:07 fir-md1-s1 kernel: [] sched_show_task+0xa8/0x110 May 01 23:43:07 fir-md1-s1 kernel: [] dump_cpu_task+0x39/0x70 May 01 23:43:07 fir-md1-s1 kernel: [] rcu_dump_cpu_stacks+0x90/0xd0 May 01 23:43:07 fir-md1-s1 kernel: [] rcu_check_callbacks+0x442/0x730 May 01 23:43:07 fir-md1-s1 kernel: [] ? tick_sched_do_timer+0x50/0x50 May 01 23:43:07 fir-md1-s1 kernel: [] update_process_times+0x46/0x80 May 01 23:43:07 fir-md1-s1 kernel: [] tick_sched_handle+0x30/0x70 May 01 23:43:07 fir-md1-s1 kernel: [] tick_sched_timer+0x39/0x80 May 01 23:43:07 fir-md1-s1 kernel: [] __hrtimer_run_queues+0xf3/0x270 May 01 23:43:07 fir-md1-s1 kernel: [] hrtimer_interrupt+0xaf/0x1d0 May 01 23:43:07 fir-md1-s1 kernel: [] ? ldiskfs_init_inode_table+0x410/0x410 [ldiskfs] May 01 23:43:07 fir-md1-s1 kernel: [] local_apic_timer_interrupt+0x3b/0x60 May 01 23:43:07 fir-md1-s1 kernel: [] smp_apic_timer_interrupt+0x43/0x60 May 01 23:43:07 fir-md1-s1 kernel: [] apic_timer_interrupt+0x162/0x170 May 01 23:43:07 fir-md1-s1 kernel: [] ? unfreeze_partials.isra.44+0xd2/0x130 May 01 23:43:07 fir-md1-s1 kernel: [] ? ldiskfs_inode_touch_time_cmp+0x14/0x90 [ldiskfs] May 01 23:43:07 fir-md1-s1 kernel: [] merge+0x62/0xc0 May 01 23:43:07 fir-md1-s1 kernel: [] ? ldiskfs_init_inode_table+0x410/0x410 [ldiskfs] May 01 23:43:07 fir-md1-s1 kernel: [] list_sort+0x9b/0x250 May 01 23:43:07 fir-md1-s1 kernel: [] __ldiskfs_es_shrink+0x1ce/0x2a0 [ldiskfs] May 01 23:43:07 fir-md1-s1 kernel: [] ldiskfs_es_shrink+0xb4/0x130 [ldiskfs] May 01 23:43:07 fir-md1-s1 kernel: [] shrink_slab+0x175/0x340 May 01 23:43:07 fir-md1-s1 kernel: [] ? zone_watermark_ok+0x1f/0x30 May 01 23:43:07 fir-md1-s1 kernel: [] ? compaction_suitable+0xa3/0xb0 May 01 23:43:07 fir-md1-s1 kernel: [] zone_reclaim+0x1d1/0x2f0 May 01 23:43:07 fir-md1-s1 kernel: [] get_page_from_freelist+0x87b/0xa70 May 01 23:43:07 fir-md1-s1 kernel: [] ? __getblk+0x2d/0x300 May 01 23:43:07 fir-md1-s1 kernel: [] __alloc_pages_nodemask+0x176/0x420 May 01 23:43:07 fir-md1-s1 kernel: [] alloc_pages_current+0x98/0x110 May 01 23:43:07 fir-md1-s1 kernel: [] new_slab+0x2c5/0x390 May 01 23:43:07 fir-md1-s1 kernel: [] ___slab_alloc+0x3ac/0x4f0 May 01 23:43:07 fir-md1-s1 kernel: [] ? osp_object_alloc+0x40/0x170 [osp] May 01 23:43:07 fir-md1-s1 kernel: [] ? fld_cache_lookup+0x36/0x1a0 [fld] May 01 23:43:07 fir-md1-s1 kernel: [] ? fld_local_lookup+0x62/0x270 [fld] May 01 23:43:07 fir-md1-s1 kernel: [] ? osp_object_alloc+0x40/0x170 [osp] May 01 23:43:07 fir-md1-s1 kernel: [] __slab_alloc+0x40/0x5c May 01 23:43:07 fir-md1-s1 kernel: [] kmem_cache_alloc+0x19b/0x1f0 May 01 23:43:07 fir-md1-s1 kernel: [] ? osp_object_alloc+0x40/0x170 [osp] May 01 23:43:07 fir-md1-s1 kernel: [] osp_object_alloc+0x40/0x170 [osp] May 01 23:43:07 fir-md1-s1 kernel: [] lod_object_init+0x1e7/0x3c0 [lod] May 01 23:43:07 fir-md1-s1 kernel: [] lu_object_alloc+0xe5/0x320 [obdclass] May 01 23:43:07 fir-md1-s1 kernel: [] lu_object_find_at+0x76/0x280 [obdclass] May 01 23:43:07 fir-md1-s1 kernel: [] lu_object_find_slice+0x1f/0x90 [obdclass] May 01 23:43:07 fir-md1-s1 kernel: [] mdd_object_find+0x10/0x70 [mdd] May 01 23:43:07 fir-md1-s1 kernel: [] obf_lookup+0x2c9/0x350 [mdd] May 01 23:43:07 fir-md1-s1 kernel: [] ? req_capsule_get_size+0x31/0x70 [ptlrpc] May 01 23:43:07 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0xf7c/0x1c30 [mdt] May 01 23:43:07 fir-md1-s1 kernel: [] ? lustre_msg_buf+0x17/0x60 [ptlrpc] May 01 23:43:07 fir-md1-s1 kernel: [] ? __req_capsule_get+0x15f/0x740 [ptlrpc] May 01 23:43:07 fir-md1-s1 kernel: [] ? lustre_msg_get_flags+0x2c/0xa0 [ptlrpc] May 01 23:43:07 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] May 01 23:43:07 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] May 01 23:43:07 fir-md1-s1 kernel: [] ? mdt_intent_layout+0xcc0/0xcc0 [mdt] May 01 23:43:07 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] May 01 23:43:07 fir-md1-s1 kernel: [] ? cfs_hash_bd_add_locked+0x63/0x80 [libcfs] May 01 23:43:07 fir-md1-s1 kernel: [] ? cfs_hash_add+0xbe/0x1a0 [libcfs] May 01 23:43:07 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] May 01 23:43:07 fir-md1-s1 kernel: [] ? lustre_swab_ldlm_lock_desc+0x30/0x30 [ptlrpc] May 01 23:43:07 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] May 01 23:43:07 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] May 01 23:43:07 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] May 01 23:43:07 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] May 01 23:43:07 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 01 23:43:07 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] May 01 23:43:07 fir-md1-s1 kernel: [] ? default_wake_function+0x12/0x20 May 01 23:43:07 fir-md1-s1 kernel: [] ? __wake_up_common+0x5b/0x90 May 01 23:43:07 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] May 01 23:43:07 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] May 01 23:43:07 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 May 01 23:43:07 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:43:07 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 May 01 23:43:07 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:43:10 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#44 stuck for 23s! [mdt_io00_041:103008] May 01 23:43:10 fir-md1-s1 kernel: Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm ses irqbypass crc32_pclmul enclosure ghash_clmulni_intel dcdbas aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ipmi_si pcspkr ipmi_devintf ccp i2c_piix4 dm_multipath sg k10temp ipmi_msghandler dm_mod acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif May 01 23:43:10 fir-md1-s1 kernel: crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper mlx5_core(OE) syscopyarea sysfillrect sysimgblt fb_sys_fops mlxfw(OE) crct10dif_pclmul ttm devlink ahci crct10dif_common libahci drm mlx_compat(OE) tg3 crc32c_intel libata megaraid_sas drm_panel_orientation_quirks ptp pps_core mpt3sas(OE) raid_class scsi_transport_sas [last unloaded: libcfs] May 01 23:43:10 fir-md1-s1 kernel: CPU: 44 PID: 103008 Comm: mdt_io00_041 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 May 01 23:43:10 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 May 01 23:43:10 fir-md1-s1 kernel: task: ffff984b564bd140 ti: ffff984cf4430000 task.ti: ffff984cf4430000 May 01 23:43:10 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x128/0x200 May 01 23:43:10 fir-md1-s1 kernel: RSP: 0018:ffff984cf44338e8 EFLAGS: 00000246 May 01 23:43:10 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffff985c7a711b78 RCX: 0000000001610000 May 01 23:43:10 fir-md1-s1 kernel: RDX: ffff984cff61b780 RSI: 0000000000110101 RDI: ffff982c9fc8c480 May 01 23:43:10 fir-md1-s1 kernel: RBP: ffff984cf44338e8 R08: ffff982cff0db780 R09: 0000000000000000 May 01 23:43:10 fir-md1-s1 kernel: R10: 0000000000000000 R11: ffff985c8b319038 R12: ffff982cff0dac00 May 01 23:43:10 fir-md1-s1 kernel: R13: ffff984b564bd1a8 R14: 00ffffffb7a08d80 R15: ffff984cf34000a0 May 01 23:43:10 fir-md1-s1 kernel: FS: 00007fad68bc1880(0000) GS:ffff982cff0c0000(0000) knlGS:0000000000000000 May 01 23:43:10 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 01 23:43:10 fir-md1-s1 kernel: CR2: 00007fad62c1b090 CR3: 00000012f7610000 CR4: 00000000003407e0 May 01 23:43:10 fir-md1-s1 kernel: Call Trace: May 01 23:43:10 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf May 01 23:43:10 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 May 01 23:43:10 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] May 01 23:43:10 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x210/0x700 [ldiskfs] May 01 23:43:10 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 May 01 23:43:10 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] May 01 23:43:10 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] May 01 23:43:10 fir-md1-s1 kernel: [] mdt_obd_preprw+0x637/0x1060 [mdt] May 01 23:43:10 fir-md1-s1 kernel: [] tgt_brw_write+0xc7e/0x1a90 [ptlrpc] May 01 23:43:10 fir-md1-s1 kernel: [] ? lustre_msg_buf_v2+0x1b0/0x1b0 [ptlrpc] May 01 23:43:10 fir-md1-s1 kernel: [] ? lustre_msg_buf+0x17/0x60 [ptlrpc] May 01 23:43:10 fir-md1-s1 kernel: [] ? update_curr+0x14c/0x1e0 May 01 23:43:10 fir-md1-s1 kernel: [] ? account_entity_dequeue+0xae/0xd0 May 01 23:43:10 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] May 01 23:43:10 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] May 01 23:43:10 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] May 01 23:43:10 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] May 01 23:43:10 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 01 23:43:10 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] May 01 23:43:10 fir-md1-s1 kernel: [] ? default_wake_function+0x12/0x20 May 01 23:43:10 fir-md1-s1 kernel: [] ? __wake_up_common+0x5b/0x90 May 01 23:43:10 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] May 01 23:43:10 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] May 01 23:43:10 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 May 01 23:43:10 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:43:10 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 May 01 23:43:10 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:43:10 fir-md1-s1 kernel: Code: 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 60 b9 b4 b7 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 41 8b 40 08 85 c0 <74> f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2 85 c0 May 01 23:43:13 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [mdt_io03_049:103296] May 01 23:43:13 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#10 stuck for 22s! [mdt_io02_062:103114] May 01 23:43:13 fir-md1-s1 kernel: Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm ses irqbypass crc32_pclmul enclosure ghash_clmulni_intel May 01 23:43:13 fir-md1-s1 kernel: Modules linked in: May 01 23:43:13 fir-md1-s1 kernel: osp(OE) May 01 23:43:13 fir-md1-s1 kernel: mdd(OE) May 01 23:43:13 fir-md1-s1 kernel: lod(OE) May 01 23:43:13 fir-md1-s1 kernel: mdt(OE) May 01 23:43:13 fir-md1-s1 kernel: lfsck(OE) May 01 23:43:13 fir-md1-s1 kernel: mgs(OE) May 01 23:43:13 fir-md1-s1 kernel: mgc(OE) May 01 23:43:13 fir-md1-s1 kernel: osd_ldiskfs(OE) May 01 23:43:13 fir-md1-s1 kernel: lquota(OE) May 01 23:43:13 fir-md1-s1 kernel: ldiskfs(OE) May 01 23:43:13 fir-md1-s1 kernel: lustre(OE) May 01 23:43:13 fir-md1-s1 kernel: lmv(OE) May 01 23:43:13 fir-md1-s1 kernel: mdc(OE) May 01 23:43:13 fir-md1-s1 kernel: osc(OE) May 01 23:43:13 fir-md1-s1 kernel: lov(OE) May 01 23:43:13 fir-md1-s1 kernel: fid(OE) May 01 23:43:13 fir-md1-s1 kernel: fld(OE) May 01 23:43:13 fir-md1-s1 kernel: ko2iblnd(OE) May 01 23:43:13 fir-md1-s1 kernel: ptlrpc(OE) May 01 23:43:13 fir-md1-s1 kernel: obdclass(OE) May 01 23:43:13 fir-md1-s1 kernel: lnet(OE) May 01 23:43:13 fir-md1-s1 kernel: libcfs(OE) May 01 23:43:13 fir-md1-s1 kernel: rpcsec_gss_krb5 May 01 23:43:13 fir-md1-s1 kernel: auth_rpcgss May 01 23:43:13 fir-md1-s1 kernel: nfsv4 May 01 23:43:13 fir-md1-s1 kernel: dns_resolver May 01 23:43:13 fir-md1-s1 kernel: nfs May 01 23:43:13 fir-md1-s1 kernel: lockd May 01 23:43:13 fir-md1-s1 kernel: grace May 01 23:43:13 fir-md1-s1 kernel: fscache May 01 23:43:13 fir-md1-s1 kernel: rdma_ucm(OE) May 01 23:43:13 fir-md1-s1 kernel: ib_ucm(OE) May 01 23:43:13 fir-md1-s1 kernel: rdma_cm(OE) May 01 23:43:13 fir-md1-s1 kernel: iw_cm(OE) May 01 23:43:13 fir-md1-s1 kernel: ib_ipoib(OE) May 01 23:43:13 fir-md1-s1 kernel: ib_cm(OE) May 01 23:43:13 fir-md1-s1 kernel: ib_umad(OE) May 01 23:43:13 fir-md1-s1 kernel: mlx5_fpga_tools(OE) May 01 23:43:13 fir-md1-s1 kernel: mlx4_en(OE) May 01 23:43:13 fir-md1-s1 kernel: mlx4_ib(OE) May 01 23:43:13 fir-md1-s1 kernel: mlx4_core(OE) May 01 23:43:13 fir-md1-s1 kernel: dell_rbu May 01 23:43:13 fir-md1-s1 kernel: sunrpc May 01 23:43:13 fir-md1-s1 kernel: vfat May 01 23:43:13 fir-md1-s1 kernel: fat May 01 23:43:13 fir-md1-s1 kernel: dm_round_robin May 01 23:43:13 fir-md1-s1 kernel: amd64_edac_mod May 01 23:43:13 fir-md1-s1 kernel: edac_mce_amd May 01 23:43:13 fir-md1-s1 kernel: kvm_amd May 01 23:43:13 fir-md1-s1 kernel: kvm May 01 23:43:13 fir-md1-s1 kernel: ses May 01 23:43:13 fir-md1-s1 kernel: irqbypass May 01 23:43:13 fir-md1-s1 kernel: crc32_pclmul May 01 23:43:13 fir-md1-s1 kernel: enclosure May 01 23:43:13 fir-md1-s1 kernel: ghash_clmulni_intel May 01 23:43:13 fir-md1-s1 kernel: dcdbas May 01 23:43:13 fir-md1-s1 kernel: aesni_intel May 01 23:43:13 fir-md1-s1 kernel: lrw May 01 23:43:13 fir-md1-s1 kernel: gf128mul May 01 23:43:13 fir-md1-s1 kernel: glue_helper May 01 23:43:13 fir-md1-s1 kernel: ablk_helper May 01 23:43:13 fir-md1-s1 kernel: cryptd May 01 23:43:13 fir-md1-s1 kernel: ipmi_si May 01 23:43:13 fir-md1-s1 kernel: pcspkr May 01 23:43:13 fir-md1-s1 kernel: ipmi_devintf May 01 23:43:13 fir-md1-s1 kernel: ccp May 01 23:43:13 fir-md1-s1 kernel: i2c_piix4 May 01 23:43:13 fir-md1-s1 kernel: dm_multipath May 01 23:43:13 fir-md1-s1 kernel: sg May 01 23:43:13 fir-md1-s1 kernel: k10temp May 01 23:43:13 fir-md1-s1 kernel: ipmi_msghandler May 01 23:43:13 fir-md1-s1 kernel: dm_mod May 01 23:43:13 fir-md1-s1 kernel: acpi_power_meter May 01 23:43:13 fir-md1-s1 kernel: knem(OE) May 01 23:43:13 fir-md1-s1 kernel: ip_tables May 01 23:43:13 fir-md1-s1 kernel: ext4 May 01 23:43:13 fir-md1-s1 kernel: mbcache May 01 23:43:13 fir-md1-s1 kernel: jbd2 May 01 23:43:13 fir-md1-s1 kernel: sd_mod May 01 23:43:13 fir-md1-s1 kernel: crc_t10dif May 01 23:43:13 fir-md1-s1 kernel: crct10dif_generic May 01 23:43:13 fir-md1-s1 kernel: mlx5_ib(OE) May 01 23:43:13 fir-md1-s1 kernel: ib_uverbs(OE) May 01 23:43:13 fir-md1-s1 kernel: ib_core(OE) May 01 23:43:13 fir-md1-s1 kernel: i2c_algo_bit May 01 23:43:13 fir-md1-s1 kernel: drm_kms_helper May 01 23:43:13 fir-md1-s1 kernel: mlx5_core(OE) May 01 23:43:13 fir-md1-s1 kernel: syscopyarea May 01 23:43:13 fir-md1-s1 kernel: sysfillrect May 01 23:43:13 fir-md1-s1 kernel: sysimgblt May 01 23:43:13 fir-md1-s1 kernel: fb_sys_fops May 01 23:43:13 fir-md1-s1 kernel: mlxfw(OE) May 01 23:43:13 fir-md1-s1 kernel: crct10dif_pclmul May 01 23:43:13 fir-md1-s1 kernel: ttm May 01 23:43:13 fir-md1-s1 kernel: devlink May 01 23:43:13 fir-md1-s1 kernel: ahci May 01 23:43:13 fir-md1-s1 kernel: crct10dif_common May 01 23:43:13 fir-md1-s1 kernel: libahci May 01 23:43:13 fir-md1-s1 kernel: drm May 01 23:43:13 fir-md1-s1 kernel: mlx_compat(OE) May 01 23:43:13 fir-md1-s1 kernel: tg3 May 01 23:43:13 fir-md1-s1 kernel: crc32c_intel May 01 23:43:13 fir-md1-s1 kernel: libata May 01 23:43:13 fir-md1-s1 kernel: megaraid_sas May 01 23:43:13 fir-md1-s1 kernel: drm_panel_orientation_quirks May 01 23:43:13 fir-md1-s1 kernel: ptp May 01 23:43:13 fir-md1-s1 kernel: pps_core May 01 23:43:13 fir-md1-s1 kernel: mpt3sas(OE) May 01 23:43:13 fir-md1-s1 kernel: raid_class May 01 23:43:13 fir-md1-s1 kernel: scsi_transport_sas May 01 23:43:13 fir-md1-s1 kernel: [last unloaded: libcfs] May 01 23:43:13 fir-md1-s1 kernel: May 01 23:43:13 fir-md1-s1 kernel: CPU: 10 PID: 103114 Comm: mdt_io02_062 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 May 01 23:43:13 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 May 01 23:43:13 fir-md1-s1 kernel: task: ffff985cdbd69040 ti: ffff985cfaf20000 task.ti: ffff985cfaf20000 May 01 23:43:13 fir-md1-s1 kernel: RIP: 0010:[] May 01 23:43:13 fir-md1-s1 kernel: [] native_queued_spin_lock_slowpath+0x126/0x200 May 01 23:43:13 fir-md1-s1 kernel: RSP: 0018:ffff985cfaf238e8 EFLAGS: 00000246 May 01 23:43:13 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffff98267ca42b78 RCX: 0000000000510000 May 01 23:43:13 fir-md1-s1 kernel: RDX: ffff982cff0db780 RSI: 0000000001610101 RDI: ffff982c9fc8c480 May 01 23:43:13 fir-md1-s1 kernel: RBP: ffff985cfaf238e8 R08: ffff984cff69b780 R09: 0000000000000000 May 01 23:43:13 fir-md1-s1 kernel: R10: 0000000000000000 R11: ffff985c8b319038 R12: ffff984cff69ac00 May 01 23:43:13 fir-md1-s1 kernel: R13: ffff985cdbd690a8 R14: 00ffffffb7a08d80 R15: ffff984cf34000a0 May 01 23:43:13 fir-md1-s1 kernel: FS: 00007f759b098700(0000) GS:ffff984cff680000(0000) knlGS:0000000000000000 May 01 23:43:13 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 01 23:43:13 fir-md1-s1 kernel: CR2: 00007f759e3fa000 CR3: 00000012f7610000 CR4: 00000000003407e0 May 01 23:43:13 fir-md1-s1 kernel: Call Trace: May 01 23:43:13 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf May 01 23:43:13 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 May 01 23:43:13 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] May 01 23:43:13 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x210/0x700 [ldiskfs] May 01 23:43:13 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 May 01 23:43:13 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] May 01 23:43:13 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] May 01 23:43:13 fir-md1-s1 kernel: [] mdt_obd_preprw+0x637/0x1060 [mdt] May 01 23:43:13 fir-md1-s1 kernel: [] tgt_brw_write+0xc7e/0x1a90 [ptlrpc] May 01 23:43:13 fir-md1-s1 kernel: [] ? load_balance+0x178/0x9a0 May 01 23:43:13 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] May 01 23:43:13 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] May 01 23:43:13 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] May 01 23:43:13 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] May 01 23:43:13 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 01 23:43:13 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] May 01 23:43:13 fir-md1-s1 kernel: [] ? default_wake_function+0x12/0x20 May 01 23:43:13 fir-md1-s1 kernel: [] ? __wake_up_common+0x5b/0x90 May 01 23:43:13 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] May 01 23:43:13 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] May 01 23:43:13 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 May 01 23:43:13 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:43:13 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 May 01 23:43:13 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:43:13 fir-md1-s1 kernel: Code: May 01 23:43:13 fir-md1-s1 kernel: 0d May 01 23:43:13 fir-md1-s1 kernel: 48 May 01 23:43:13 fir-md1-s1 kernel: 98 May 01 23:43:13 fir-md1-s1 kernel: 83 May 01 23:43:13 fir-md1-s1 kernel: e2 May 01 23:43:13 fir-md1-s1 kernel: 30 May 01 23:43:13 fir-md1-s1 kernel: 48 May 01 23:43:13 fir-md1-s1 kernel: 81 May 01 23:43:13 fir-md1-s1 kernel: c2 May 01 23:43:13 fir-md1-s1 kernel: 80 May 01 23:43:13 fir-md1-s1 kernel: b7 May 01 23:43:13 fir-md1-s1 kernel: 01 May 01 23:43:13 fir-md1-s1 kernel: 00 May 01 23:43:13 fir-md1-s1 kernel: 48 May 01 23:43:13 fir-md1-s1 kernel: 03 May 01 23:43:13 fir-md1-s1 kernel: 14 May 01 23:43:13 fir-md1-s1 kernel: c5 May 01 23:43:13 fir-md1-s1 kernel: 60 May 01 23:43:13 fir-md1-s1 kernel: b9 May 01 23:43:13 fir-md1-s1 kernel: b4 May 01 23:43:13 fir-md1-s1 kernel: b7 May 01 23:43:13 fir-md1-s1 kernel: 4c May 01 23:43:13 fir-md1-s1 kernel: 89 May 01 23:43:13 fir-md1-s1 kernel: 02 May 01 23:43:13 fir-md1-s1 kernel: 41 May 01 23:43:13 fir-md1-s1 kernel: 8b May 01 23:43:13 fir-md1-s1 kernel: 40 May 01 23:43:13 fir-md1-s1 kernel: 08 May 01 23:43:13 fir-md1-s1 kernel: 85 May 01 23:43:13 fir-md1-s1 kernel: c0 May 01 23:43:13 fir-md1-s1 kernel: 75 May 01 23:43:13 fir-md1-s1 kernel: 0f May 01 23:43:13 fir-md1-s1 kernel: 0f May 01 23:43:13 fir-md1-s1 kernel: 1f May 01 23:43:13 fir-md1-s1 kernel: 44 May 01 23:43:13 fir-md1-s1 kernel: 00 May 01 23:43:13 fir-md1-s1 kernel: 00 May 01 23:43:13 fir-md1-s1 kernel: f3 May 01 23:43:13 fir-md1-s1 kernel: 90 May 01 23:43:13 fir-md1-s1 kernel: 41 May 01 23:43:13 fir-md1-s1 kernel: 8b May 01 23:43:13 fir-md1-s1 kernel: 40 May 01 23:43:13 fir-md1-s1 kernel: 08 May 01 23:43:13 fir-md1-s1 kernel: <85> May 01 23:43:13 fir-md1-s1 kernel: c0 May 01 23:43:13 fir-md1-s1 kernel: 74 May 01 23:43:13 fir-md1-s1 kernel: f6 May 01 23:43:13 fir-md1-s1 kernel: 4d May 01 23:43:13 fir-md1-s1 kernel: 8b May 01 23:43:13 fir-md1-s1 kernel: 08 May 01 23:43:13 fir-md1-s1 kernel: 4d May 01 23:43:13 fir-md1-s1 kernel: 85 May 01 23:43:13 fir-md1-s1 kernel: c9 May 01 23:43:13 fir-md1-s1 kernel: 74 May 01 23:43:13 fir-md1-s1 kernel: 04 May 01 23:43:13 fir-md1-s1 kernel: 41 May 01 23:43:13 fir-md1-s1 kernel: 0f May 01 23:43:13 fir-md1-s1 kernel: 18 May 01 23:43:13 fir-md1-s1 kernel: 09 May 01 23:43:13 fir-md1-s1 kernel: 8b May 01 23:43:13 fir-md1-s1 kernel: 17 May 01 23:43:13 fir-md1-s1 kernel: 0f May 01 23:43:13 fir-md1-s1 kernel: b7 May 01 23:43:13 fir-md1-s1 kernel: c2 May 01 23:43:13 fir-md1-s1 kernel: May 01 23:43:13 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#13 stuck for 22s! [mdt_io01_080:103078] May 01 23:43:13 fir-md1-s1 kernel: Modules linked in: May 01 23:43:13 fir-md1-s1 kernel: osp(OE) May 01 23:43:13 fir-md1-s1 kernel: mdd(OE) May 01 23:43:13 fir-md1-s1 kernel: lod(OE) May 01 23:43:13 fir-md1-s1 kernel: mdt(OE) May 01 23:43:13 fir-md1-s1 kernel: lfsck(OE) May 01 23:43:13 fir-md1-s1 kernel: mgs(OE) May 01 23:43:13 fir-md1-s1 kernel: mgc(OE) May 01 23:43:13 fir-md1-s1 kernel: osd_ldiskfs(OE) May 01 23:43:13 fir-md1-s1 kernel: lquota(OE) May 01 23:43:13 fir-md1-s1 kernel: ldiskfs(OE) May 01 23:43:13 fir-md1-s1 kernel: lustre(OE) May 01 23:43:13 fir-md1-s1 kernel: lmv(OE) May 01 23:43:13 fir-md1-s1 kernel: mdc(OE) May 01 23:43:13 fir-md1-s1 kernel: osc(OE) May 01 23:43:13 fir-md1-s1 kernel: lov(OE) May 01 23:43:13 fir-md1-s1 kernel: fid(OE) May 01 23:43:13 fir-md1-s1 kernel: fld(OE) May 01 23:43:13 fir-md1-s1 kernel: ko2iblnd(OE) May 01 23:43:13 fir-md1-s1 kernel: ptlrpc(OE) May 01 23:43:13 fir-md1-s1 kernel: obdclass(OE) May 01 23:43:13 fir-md1-s1 kernel: lnet(OE) May 01 23:43:13 fir-md1-s1 kernel: libcfs(OE) May 01 23:43:13 fir-md1-s1 kernel: rpcsec_gss_krb5 May 01 23:43:13 fir-md1-s1 kernel: auth_rpcgss May 01 23:43:13 fir-md1-s1 kernel: nfsv4 May 01 23:43:13 fir-md1-s1 kernel: dns_resolver May 01 23:43:13 fir-md1-s1 kernel: nfs May 01 23:43:13 fir-md1-s1 kernel: lockd May 01 23:43:13 fir-md1-s1 kernel: grace May 01 23:43:13 fir-md1-s1 kernel: fscache May 01 23:43:13 fir-md1-s1 kernel: rdma_ucm(OE) May 01 23:43:13 fir-md1-s1 kernel: ib_ucm(OE) May 01 23:43:13 fir-md1-s1 kernel: rdma_cm(OE) May 01 23:43:13 fir-md1-s1 kernel: iw_cm(OE) May 01 23:43:13 fir-md1-s1 kernel: ib_ipoib(OE) May 01 23:43:13 fir-md1-s1 kernel: ib_cm(OE) May 01 23:43:13 fir-md1-s1 kernel: ib_umad(OE) May 01 23:43:13 fir-md1-s1 kernel: mlx5_fpga_tools(OE) May 01 23:43:13 fir-md1-s1 kernel: mlx4_en(OE) May 01 23:43:13 fir-md1-s1 kernel: mlx4_ib(OE) May 01 23:43:13 fir-md1-s1 kernel: mlx4_core(OE) May 01 23:43:13 fir-md1-s1 kernel: dell_rbu May 01 23:43:13 fir-md1-s1 kernel: sunrpc May 01 23:43:13 fir-md1-s1 kernel: vfat May 01 23:43:13 fir-md1-s1 kernel: fat May 01 23:43:13 fir-md1-s1 kernel: dm_round_robin May 01 23:43:13 fir-md1-s1 kernel: amd64_edac_mod May 01 23:43:13 fir-md1-s1 kernel: edac_mce_amd May 01 23:43:13 fir-md1-s1 kernel: kvm_amd May 01 23:43:13 fir-md1-s1 kernel: kvm May 01 23:43:13 fir-md1-s1 kernel: ses May 01 23:43:13 fir-md1-s1 kernel: irqbypass May 01 23:43:13 fir-md1-s1 kernel: crc32_pclmul May 01 23:43:13 fir-md1-s1 kernel: enclosure May 01 23:43:13 fir-md1-s1 kernel: ghash_clmulni_intel May 01 23:43:13 fir-md1-s1 kernel: dcdbas May 01 23:43:13 fir-md1-s1 kernel: aesni_intel May 01 23:43:13 fir-md1-s1 kernel: lrw May 01 23:43:13 fir-md1-s1 kernel: gf128mul May 01 23:43:13 fir-md1-s1 kernel: glue_helper May 01 23:43:13 fir-md1-s1 kernel: ablk_helper May 01 23:43:13 fir-md1-s1 kernel: cryptd May 01 23:43:13 fir-md1-s1 kernel: ipmi_si May 01 23:43:13 fir-md1-s1 kernel: pcspkr May 01 23:43:13 fir-md1-s1 kernel: ipmi_devintf May 01 23:43:13 fir-md1-s1 kernel: ccp May 01 23:43:13 fir-md1-s1 kernel: i2c_piix4 May 01 23:43:13 fir-md1-s1 kernel: dm_multipath May 01 23:43:13 fir-md1-s1 kernel: sg May 01 23:43:13 fir-md1-s1 kernel: k10temp May 01 23:43:13 fir-md1-s1 kernel: ipmi_msghandler May 01 23:43:13 fir-md1-s1 kernel: dm_mod May 01 23:43:13 fir-md1-s1 kernel: acpi_power_meter May 01 23:43:13 fir-md1-s1 kernel: knem(OE) May 01 23:43:13 fir-md1-s1 kernel: ip_tables May 01 23:43:13 fir-md1-s1 kernel: ext4 May 01 23:43:13 fir-md1-s1 kernel: mbcache May 01 23:43:13 fir-md1-s1 kernel: jbd2 May 01 23:43:13 fir-md1-s1 kernel: sd_mod May 01 23:43:13 fir-md1-s1 kernel: crc_t10dif May 01 23:43:13 fir-md1-s1 kernel: crct10dif_generic May 01 23:43:13 fir-md1-s1 kernel: mlx5_ib(OE) May 01 23:43:13 fir-md1-s1 kernel: ib_uverbs(OE) May 01 23:43:13 fir-md1-s1 kernel: ib_core(OE) May 01 23:43:13 fir-md1-s1 kernel: i2c_algo_bit May 01 23:43:13 fir-md1-s1 kernel: drm_kms_helper May 01 23:43:13 fir-md1-s1 kernel: mlx5_core(OE) May 01 23:43:13 fir-md1-s1 kernel: syscopyarea May 01 23:43:13 fir-md1-s1 kernel: sysfillrect May 01 23:43:13 fir-md1-s1 kernel: sysimgblt May 01 23:43:13 fir-md1-s1 kernel: fb_sys_fops May 01 23:43:13 fir-md1-s1 kernel: mlxfw(OE) May 01 23:43:13 fir-md1-s1 kernel: crct10dif_pclmul May 01 23:43:13 fir-md1-s1 kernel: ttm May 01 23:43:13 fir-md1-s1 kernel: devlink May 01 23:43:13 fir-md1-s1 kernel: ahci May 01 23:43:13 fir-md1-s1 kernel: crct10dif_common May 01 23:43:13 fir-md1-s1 kernel: libahci May 01 23:43:13 fir-md1-s1 kernel: drm May 01 23:43:13 fir-md1-s1 kernel: mlx_compat(OE) May 01 23:43:13 fir-md1-s1 kernel: tg3 May 01 23:43:13 fir-md1-s1 kernel: crc32c_intel May 01 23:43:13 fir-md1-s1 kernel: libata May 01 23:43:13 fir-md1-s1 kernel: megaraid_sas May 01 23:43:13 fir-md1-s1 kernel: drm_panel_orientation_quirks May 01 23:43:13 fir-md1-s1 kernel: ptp May 01 23:43:13 fir-md1-s1 kernel: pps_core May 01 23:43:13 fir-md1-s1 kernel: mpt3sas(OE) May 01 23:43:13 fir-md1-s1 kernel: raid_class May 01 23:43:13 fir-md1-s1 kernel: scsi_transport_sas May 01 23:43:13 fir-md1-s1 kernel: [last unloaded: libcfs] May 01 23:43:13 fir-md1-s1 kernel: May 01 23:43:13 fir-md1-s1 kernel: CPU: 13 PID: 103078 Comm: mdt_io01_080 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 May 01 23:43:13 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 May 01 23:43:13 fir-md1-s1 kernel: task: ffff985cfe900000 ti: ffff985bc0140000 task.ti: ffff985bc0140000 May 01 23:43:13 fir-md1-s1 kernel: RIP: 0010:[] May 01 23:43:13 fir-md1-s1 kernel: [] native_queued_spin_lock_slowpath+0x122/0x200 May 01 23:43:13 fir-md1-s1 kernel: RSP: 0018:ffff985bc01438e8 EFLAGS: 00000246 May 01 23:43:13 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffff98267ca42b78 RCX: 0000000000690000 May 01 23:43:13 fir-md1-s1 kernel: RDX: ffff984cff69b780 RSI: 0000000000510101 RDI: ffff982c9fc8c480 May 01 23:43:13 fir-md1-s1 kernel: RBP: ffff985bc01438e8 R08: ffff983cff6db780 R09: 0000000000000000 May 01 23:43:13 fir-md1-s1 kernel: R10: 0000000000000000 R11: ffff985c8b319038 R12: ffff983cff7dac00 May 01 23:43:13 fir-md1-s1 kernel: R13: ffff985cfe900068 R14: 00ff985bc0143850 R15: ffff984cf34000a0 May 01 23:43:13 fir-md1-s1 kernel: FS: 00007f7593fff700(0000) GS:ffff983cff6c0000(0000) knlGS:0000000000000000 May 01 23:43:13 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 01 23:43:13 fir-md1-s1 kernel: CR2: 00007f759e3fa000 CR3: 00000012f7610000 CR4: 00000000003407e0 May 01 23:43:13 fir-md1-s1 kernel: Call Trace: May 01 23:43:13 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf May 01 23:43:13 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 May 01 23:43:13 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] May 01 23:43:13 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x210/0x700 [ldiskfs] May 01 23:43:13 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 May 01 23:43:13 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] May 01 23:43:13 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] May 01 23:43:13 fir-md1-s1 kernel: [] mdt_obd_preprw+0x637/0x1060 [mdt] May 01 23:43:13 fir-md1-s1 kernel: dcdbas aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ipmi_si pcspkr ipmi_devintf ccp i2c_piix4 dm_multipath sg k10temp ipmi_msghandler dm_mod acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper mlx5_core(OE) syscopyarea sysfillrect sysimgblt fb_sys_fops mlxfw(OE) crct10dif_pclmul ttm devlink ahci crct10dif_common libahci drm mlx_compat(OE) tg3 crc32c_intel libata megaraid_sas drm_panel_orientation_quirks ptp pps_core mpt3sas(OE) raid_class scsi_transport_sas [last unloaded: libcfs] May 01 23:43:13 fir-md1-s1 kernel: CPU: 3 PID: 103296 Comm: mdt_io03_049 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 May 01 23:43:13 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 May 01 23:43:13 fir-md1-s1 kernel: task: ffff982cf0b6d140 ti: ffff98283e6f8000 task.ti: ffff98283e6f8000 May 01 23:43:13 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 May 01 23:43:13 fir-md1-s1 kernel: RSP: 0018:ffff98283e6fb8e8 EFLAGS: 00000246 May 01 23:43:13 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffff984851f7a378 RCX: 0000000000190000 May 01 23:43:13 fir-md1-s1 kernel: RDX: ffff982cff05b780 RSI: 0000000001210101 RDI: ffff982c9fc8c480 May 01 23:43:13 fir-md1-s1 kernel: RBP: ffff98283e6fb8e8 R08: ffff985d3f41b780 R09: 0000000000000000 May 01 23:43:13 fir-md1-s1 kernel: R10: 0000000000000000 R11: ffff985c8b319038 R12: ffff985d3f41ac00 May 01 23:43:13 fir-md1-s1 kernel: R13: ffff982cf0b6d1a8 R14: 00ffffffb7a08d80 R15: ffff984cf34000a0 May 01 23:43:13 fir-md1-s1 kernel: FS: 00007f0117f95880(0000) GS:ffff985d3f400000(0000) knlGS:0000000000000000 May 01 23:43:13 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 01 23:43:13 fir-md1-s1 kernel: CR2: 00007f010587e03c CR3: 00000030346d6000 CR4: 00000000003407e0 May 01 23:43:13 fir-md1-s1 kernel: Call Trace: May 01 23:43:13 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf May 01 23:43:13 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 May 01 23:43:13 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] May 01 23:43:13 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x210/0x700 [ldiskfs] May 01 23:43:13 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 May 01 23:43:13 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] May 01 23:43:13 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] May 01 23:43:13 fir-md1-s1 kernel: [] mdt_obd_preprw+0x637/0x1060 [mdt] May 01 23:43:13 fir-md1-s1 kernel: [] tgt_brw_write+0xc7e/0x1a90 [ptlrpc] May 01 23:43:13 fir-md1-s1 kernel: [] ? __enqueue_entity+0x78/0x80 May 01 23:43:13 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] May 01 23:43:13 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] May 01 23:43:13 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] May 01 23:43:13 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] May 01 23:43:13 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 01 23:43:13 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] May 01 23:43:13 fir-md1-s1 kernel: [] ? default_wake_function+0x12/0x20 May 01 23:43:13 fir-md1-s1 kernel: [] ? __wake_up_common+0x5b/0x90 May 01 23:43:13 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] May 01 23:43:13 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] May 01 23:43:13 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 May 01 23:43:13 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:43:13 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 May 01 23:43:13 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:43:13 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 60 b9 b4 b7 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b May 01 23:43:15 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#36 stuck for 22s! [mdt_io00_043:103023] May 01 23:43:15 fir-md1-s1 kernel: Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm ses irqbypass crc32_pclmul enclosure ghash_clmulni_intel dcdbas aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ipmi_si pcspkr ipmi_devintf ccp i2c_piix4 dm_multipath sg k10temp ipmi_msghandler dm_mod acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif May 01 23:43:15 fir-md1-s1 kernel: crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper mlx5_core(OE) syscopyarea sysfillrect sysimgblt fb_sys_fops mlxfw(OE) crct10dif_pclmul ttm devlink ahci crct10dif_common libahci drm mlx_compat(OE) tg3 crc32c_intel libata megaraid_sas drm_panel_orientation_quirks ptp pps_core mpt3sas(OE) raid_class scsi_transport_sas [last unloaded: libcfs] May 01 23:43:15 fir-md1-s1 kernel: CPU: 36 PID: 103023 Comm: mdt_io00_043 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 May 01 23:43:15 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 May 01 23:43:15 fir-md1-s1 kernel: task: ffff98379d684100 ti: ffff9837162a0000 task.ti: ffff9837162a0000 May 01 23:43:15 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 May 01 23:43:15 fir-md1-s1 kernel: RSP: 0018:ffff9837162a38e8 EFLAGS: 00000246 May 01 23:43:15 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffff9837d662db78 RCX: 0000000001210000 May 01 23:43:15 fir-md1-s1 kernel: RDX: ffff983cff6db780 RSI: 0000000000690101 RDI: ffff982c9fc8c480 May 01 23:43:15 fir-md1-s1 kernel: RBP: ffff9837162a38e8 R08: ffff982cff05b780 R09: 0000000000000000 May 01 23:43:15 fir-md1-s1 kernel: R10: 0000000000000000 R11: ffff985c8b319038 R12: 0000000000000000 May 01 23:43:15 fir-md1-s1 kernel: R13: 0000000000000000 R14: 00ff981e3fd3ec00 R15: ffff984cf34000a0 May 01 23:43:15 fir-md1-s1 kernel: FS: 00007f67c209f740(0000) GS:ffff982cff040000(0000) knlGS:0000000000000000 May 01 23:43:15 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 01 23:43:15 fir-md1-s1 kernel: CR2: 00007f67c0de780d CR3: 00000012f7610000 CR4: 00000000003407e0 May 01 23:43:15 fir-md1-s1 kernel: Call Trace: May 01 23:43:15 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf May 01 23:43:15 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 May 01 23:43:15 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] May 01 23:43:15 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x210/0x700 [ldiskfs] May 01 23:43:15 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 May 01 23:43:15 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] May 01 23:43:15 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] May 01 23:43:15 fir-md1-s1 kernel: [] mdt_obd_preprw+0x637/0x1060 [mdt] May 01 23:43:15 fir-md1-s1 kernel: [] tgt_brw_write+0xc7e/0x1a90 [ptlrpc] May 01 23:43:15 fir-md1-s1 kernel: [] ? load_balance+0x178/0x9a0 May 01 23:43:15 fir-md1-s1 kernel: [] ? __enqueue_entity+0x78/0x80 May 01 23:43:15 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] May 01 23:43:15 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] May 01 23:43:15 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] May 01 23:43:15 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] May 01 23:43:15 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 01 23:43:15 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] May 01 23:43:15 fir-md1-s1 kernel: [] ? default_wake_function+0x12/0x20 May 01 23:43:15 fir-md1-s1 kernel: [] ? __wake_up_common+0x5b/0x90 May 01 23:43:15 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] May 01 23:43:15 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] May 01 23:43:15 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 May 01 23:43:15 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:43:15 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 May 01 23:43:15 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:43:15 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 60 b9 b4 b7 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b May 01 23:43:15 fir-md1-s1 kernel: [] tgt_brw_write+0xc7e/0x1a90 [ptlrpc] May 01 23:43:15 fir-md1-s1 kernel: [] ? lustre_msg_buf_v2+0x1b0/0x1b0 [ptlrpc] May 01 23:43:15 fir-md1-s1 kernel: [] ? lustre_msg_buf+0x17/0x60 [ptlrpc] May 01 23:43:15 fir-md1-s1 kernel: [] ? update_curr+0x14c/0x1e0 May 01 23:43:15 fir-md1-s1 kernel: [] ? account_entity_dequeue+0xae/0xd0 May 01 23:43:15 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] May 01 23:43:15 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] May 01 23:43:15 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] May 01 23:43:15 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] May 01 23:43:15 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 01 23:43:15 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] May 01 23:43:15 fir-md1-s1 kernel: [] ? default_wake_function+0x12/0x20 May 01 23:43:15 fir-md1-s1 kernel: [] ? __wake_up_common+0x5b/0x90 May 01 23:43:15 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] May 01 23:43:15 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] May 01 23:43:15 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 May 01 23:43:15 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:43:15 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 May 01 23:43:15 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:43:15 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 60 b9 b4 b7 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b May 01 23:43:18 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#26 stuck for 23s! [mdt_io02_043:103027] May 01 23:43:18 fir-md1-s1 kernel: Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm ses irqbypass crc32_pclmul enclosure ghash_clmulni_intel dcdbas aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ipmi_si pcspkr ipmi_devintf ccp i2c_piix4 dm_multipath sg k10temp ipmi_msghandler dm_mod acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif May 01 23:43:18 fir-md1-s1 kernel: crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper mlx5_core(OE) syscopyarea sysfillrect sysimgblt fb_sys_fops mlxfw(OE) crct10dif_pclmul ttm devlink ahci crct10dif_common libahci drm mlx_compat(OE) tg3 crc32c_intel libata megaraid_sas drm_panel_orientation_quirks ptp pps_core mpt3sas(OE) raid_class scsi_transport_sas [last unloaded: libcfs] May 01 23:43:18 fir-md1-s1 kernel: CPU: 26 PID: 103027 Comm: mdt_io02_043 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 May 01 23:43:18 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 May 01 23:43:18 fir-md1-s1 kernel: task: ffff982c812730c0 ti: ffff983a5ba80000 task.ti: ffff983a5ba80000 May 01 23:43:18 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 May 01 23:43:18 fir-md1-s1 kernel: RSP: 0018:ffff983a5ba83800 EFLAGS: 00000246 May 01 23:43:18 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffff983165526b60 RCX: 0000000000d10000 May 01 23:43:18 fir-md1-s1 kernel: RDX: ffff982cfeedb780 RSI: 0000000000610101 RDI: ffff982c9fc8c480 May 01 23:43:18 fir-md1-s1 kernel: RBP: ffff983a5ba83800 R08: ffff984cff79b780 R09: 0000000000000000 May 01 23:43:18 fir-md1-s1 kernel: R10: ffff984cff79f140 R11: ffffde3f6b96dc00 R12: 0000000000000000 May 01 23:43:18 fir-md1-s1 kernel: R13: ffff983a5ba837a0 R14: ffff9831655268d0 R15: 0000000000000000 May 01 23:43:18 fir-md1-s1 kernel: FS: 00007fa424097780(0000) GS:ffff984cff780000(0000) knlGS:0000000000000000 May 01 23:43:18 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 01 23:43:18 fir-md1-s1 kernel: Lustre: fir-MDT0000-lwp-MDT0002: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete May 01 23:43:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Received new LWP connection from 0@lo, removing former export from same NID May 01 23:43:18 fir-md1-s1 kernel: CR2: 00007fa4240a8000 CR3: 00000012f7610000 CR4: 00000000003407e0 May 01 23:43:18 fir-md1-s1 kernel: Call Trace: May 01 23:43:18 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf May 01 23:43:18 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 May 01 23:43:18 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] May 01 23:43:18 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] May 01 23:43:18 fir-md1-s1 kernel: [] ? ktime_get+0x52/0xe0 May 01 23:43:18 fir-md1-s1 kernel: [] ? kiblnd_check_sends_locked+0xa72/0xe40 [ko2iblnd] May 01 23:43:18 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] May 01 23:43:18 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 May 01 23:43:18 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] May 01 23:43:18 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] May 01 23:43:18 fir-md1-s1 kernel: [] mdt_obd_preprw+0x637/0x1060 [mdt] May 01 23:43:18 fir-md1-s1 kernel: [] tgt_brw_write+0xc7e/0x1a90 [ptlrpc] May 01 23:43:18 fir-md1-s1 kernel: [] ? lustre_msg_buf_v2+0x1b0/0x1b0 [ptlrpc] May 01 23:43:18 fir-md1-s1 kernel: [] ? lustre_msg_buf+0x17/0x60 [ptlrpc] May 01 23:43:18 fir-md1-s1 kernel: [] ? update_curr+0x14c/0x1e0 May 01 23:43:18 fir-md1-s1 kernel: [] ? account_entity_dequeue+0xae/0xd0 May 01 23:43:18 fir-md1-s1 kernel: [] ? __enqueue_entity+0x78/0x80 May 01 23:43:18 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] May 01 23:43:18 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] May 01 23:43:18 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] May 01 23:43:18 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] May 01 23:43:18 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 01 23:43:18 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] May 01 23:43:18 fir-md1-s1 kernel: [] ? default_wake_function+0x12/0x20 May 01 23:43:18 fir-md1-s1 kernel: [] ? __wake_up_common+0x5b/0x90 May 01 23:43:18 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] May 01 23:43:18 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] May 01 23:43:18 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 May 01 23:43:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:43:18 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 May 01 23:43:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:43:18 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 60 b9 b4 b7 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b May 01 23:43:19 fir-md1-s1 kernel: Lustre: 103137:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 29s req@ffff984763ab9b00 x1631768250398768/t0(0) o35->1029f32e-c536-81b0-6441-16ee4f005637@10.8.22.9@o2ib6:0/0 lens 392/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 May 01 23:43:19 fir-md1-s1 kernel: Lustre: mdt_readpage: This server is not able to keep up with request traffic (cpu-bound). May 01 23:43:19 fir-md1-s1 kernel: Lustre: 103137:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=49 reqQ=448 recA=59, svcEst=20, delay=34252 May 01 23:43:19 fir-md1-s1 kernel: Lustre: 103137:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-5s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff984bf8a23c50 x1631604127569888/t0(0) o101->fir-MDT0000-lwp-OST001b_UUID@10.0.10.106@o2ib7:14/0 lens 456/0 e 0 to 0 dl 1556779394 ref 2 fl New:/0/ffffffff rc 0/-1 May 01 23:43:23 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [mdt_io02_001:101733] May 01 23:43:23 fir-md1-s1 kernel: Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm ses irqbypass crc32_pclmul enclosure ghash_clmulni_intel dcdbas aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ipmi_si pcspkr ipmi_devintf ccp i2c_piix4 dm_multipath sg k10temp ipmi_msghandler dm_mod acpi_power_meter knem(OE) ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif May 01 23:43:23 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#15 stuck for 22s! [mdt_io03_043:103125] May 01 23:43:23 fir-md1-s1 kernel: crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) i2c_algo_bit drm_kms_helper mlx5_core(OE) syscopyarea sysfillrect sysimgblt fb_sys_fops mlxfw(OE) crct10dif_pclmul ttm devlink ahci crct10dif_common libahci drm mlx_compat(OE) tg3 crc32c_intel libata megaraid_sas drm_panel_orientation_quirks ptp pps_core mpt3sas(OE) raid_class scsi_transport_sas May 01 23:43:23 fir-md1-s1 kernel: Modules linked in: May 01 23:43:23 fir-md1-s1 kernel: osp(OE) May 01 23:43:23 fir-md1-s1 kernel: mdd(OE) May 01 23:43:23 fir-md1-s1 kernel: lod(OE) May 01 23:43:23 fir-md1-s1 kernel: mdt(OE) May 01 23:43:23 fir-md1-s1 kernel: lfsck(OE) May 01 23:43:23 fir-md1-s1 kernel: mgs(OE) May 01 23:43:23 fir-md1-s1 kernel: mgc(OE) May 01 23:43:23 fir-md1-s1 kernel: osd_ldiskfs(OE) May 01 23:43:23 fir-md1-s1 kernel: lquota(OE) May 01 23:43:23 fir-md1-s1 kernel: ldiskfs(OE) May 01 23:43:23 fir-md1-s1 kernel: lustre(OE) May 01 23:43:23 fir-md1-s1 kernel: lmv(OE) May 01 23:43:23 fir-md1-s1 kernel: mdc(OE) May 01 23:43:23 fir-md1-s1 kernel: osc(OE) May 01 23:43:23 fir-md1-s1 kernel: lov(OE) May 01 23:43:23 fir-md1-s1 kernel: fid(OE) May 01 23:43:23 fir-md1-s1 kernel: fld(OE) May 01 23:43:23 fir-md1-s1 kernel: ko2iblnd(OE) May 01 23:43:23 fir-md1-s1 kernel: ptlrpc(OE) May 01 23:43:23 fir-md1-s1 kernel: obdclass(OE) May 01 23:43:23 fir-md1-s1 kernel: lnet(OE) May 01 23:43:23 fir-md1-s1 kernel: libcfs(OE) May 01 23:43:23 fir-md1-s1 kernel: rpcsec_gss_krb5 May 01 23:43:23 fir-md1-s1 kernel: auth_rpcgss May 01 23:43:23 fir-md1-s1 kernel: nfsv4 May 01 23:43:23 fir-md1-s1 kernel: dns_resolver May 01 23:43:23 fir-md1-s1 kernel: nfs May 01 23:43:23 fir-md1-s1 kernel: lockd May 01 23:43:23 fir-md1-s1 kernel: grace May 01 23:43:23 fir-md1-s1 kernel: fscache May 01 23:43:23 fir-md1-s1 kernel: rdma_ucm(OE) May 01 23:43:23 fir-md1-s1 kernel: ib_ucm(OE) May 01 23:43:23 fir-md1-s1 kernel: rdma_cm(OE) May 01 23:43:23 fir-md1-s1 kernel: iw_cm(OE) May 01 23:43:23 fir-md1-s1 kernel: ib_ipoib(OE) May 01 23:43:23 fir-md1-s1 kernel: ib_cm(OE) May 01 23:43:23 fir-md1-s1 kernel: ib_umad(OE) May 01 23:43:23 fir-md1-s1 kernel: mlx5_fpga_tools(OE) May 01 23:43:23 fir-md1-s1 kernel: mlx4_en(OE) May 01 23:43:23 fir-md1-s1 kernel: mlx4_ib(OE) May 01 23:43:23 fir-md1-s1 kernel: mlx4_core(OE) May 01 23:43:23 fir-md1-s1 kernel: dell_rbu May 01 23:43:23 fir-md1-s1 kernel: sunrpc May 01 23:43:23 fir-md1-s1 kernel: vfat May 01 23:43:23 fir-md1-s1 kernel: fat May 01 23:43:23 fir-md1-s1 kernel: dm_round_robin May 01 23:43:23 fir-md1-s1 kernel: amd64_edac_mod May 01 23:43:23 fir-md1-s1 kernel: edac_mce_amd May 01 23:43:23 fir-md1-s1 kernel: kvm_amd May 01 23:43:23 fir-md1-s1 kernel: kvm May 01 23:43:23 fir-md1-s1 kernel: ses May 01 23:43:23 fir-md1-s1 kernel: irqbypass May 01 23:43:23 fir-md1-s1 kernel: crc32_pclmul May 01 23:43:23 fir-md1-s1 kernel: enclosure May 01 23:43:23 fir-md1-s1 kernel: ghash_clmulni_intel May 01 23:43:23 fir-md1-s1 kernel: dcdbas May 01 23:43:23 fir-md1-s1 kernel: aesni_intel May 01 23:43:23 fir-md1-s1 kernel: lrw May 01 23:43:23 fir-md1-s1 kernel: gf128mul May 01 23:43:23 fir-md1-s1 kernel: glue_helper May 01 23:43:23 fir-md1-s1 kernel: ablk_helper May 01 23:43:23 fir-md1-s1 kernel: cryptd May 01 23:43:23 fir-md1-s1 kernel: ipmi_si May 01 23:43:23 fir-md1-s1 kernel: pcspkr May 01 23:43:23 fir-md1-s1 kernel: ipmi_devintf May 01 23:43:23 fir-md1-s1 kernel: ccp May 01 23:43:23 fir-md1-s1 kernel: i2c_piix4 May 01 23:43:23 fir-md1-s1 kernel: dm_multipath May 01 23:43:23 fir-md1-s1 kernel: sg May 01 23:43:23 fir-md1-s1 kernel: k10temp May 01 23:43:23 fir-md1-s1 kernel: ipmi_msghandler May 01 23:43:23 fir-md1-s1 kernel: dm_mod May 01 23:43:23 fir-md1-s1 kernel: acpi_power_meter May 01 23:43:23 fir-md1-s1 kernel: knem(OE) May 01 23:43:23 fir-md1-s1 kernel: ip_tables May 01 23:43:23 fir-md1-s1 kernel: ext4 May 01 23:43:23 fir-md1-s1 kernel: mbcache May 01 23:43:23 fir-md1-s1 kernel: jbd2 May 01 23:43:23 fir-md1-s1 kernel: sd_mod May 01 23:43:23 fir-md1-s1 kernel: crc_t10dif May 01 23:43:23 fir-md1-s1 kernel: crct10dif_generic May 01 23:43:23 fir-md1-s1 kernel: mlx5_ib(OE) May 01 23:43:23 fir-md1-s1 kernel: ib_uverbs(OE) May 01 23:43:23 fir-md1-s1 kernel: ib_core(OE) May 01 23:43:23 fir-md1-s1 kernel: i2c_algo_bit May 01 23:43:23 fir-md1-s1 kernel: drm_kms_helper May 01 23:43:23 fir-md1-s1 kernel: mlx5_core(OE) May 01 23:43:23 fir-md1-s1 kernel: syscopyarea May 01 23:43:23 fir-md1-s1 kernel: sysfillrect May 01 23:43:23 fir-md1-s1 kernel: sysimgblt May 01 23:43:23 fir-md1-s1 kernel: fb_sys_fops May 01 23:43:23 fir-md1-s1 kernel: mlxfw(OE) May 01 23:43:23 fir-md1-s1 kernel: crct10dif_pclmul May 01 23:43:23 fir-md1-s1 kernel: ttm May 01 23:43:23 fir-md1-s1 kernel: devlink May 01 23:43:23 fir-md1-s1 kernel: ahci May 01 23:43:23 fir-md1-s1 kernel: crct10dif_common May 01 23:43:23 fir-md1-s1 kernel: libahci May 01 23:43:23 fir-md1-s1 kernel: drm May 01 23:43:23 fir-md1-s1 kernel: mlx_compat(OE) May 01 23:43:23 fir-md1-s1 kernel: tg3 May 01 23:43:23 fir-md1-s1 kernel: crc32c_intel May 01 23:43:23 fir-md1-s1 kernel: libata May 01 23:43:23 fir-md1-s1 kernel: megaraid_sas May 01 23:43:23 fir-md1-s1 kernel: drm_panel_orientation_quirks May 01 23:43:23 fir-md1-s1 kernel: ptp May 01 23:43:23 fir-md1-s1 kernel: pps_core May 01 23:43:23 fir-md1-s1 kernel: mpt3sas(OE) May 01 23:43:23 fir-md1-s1 kernel: raid_class May 01 23:43:23 fir-md1-s1 kernel: scsi_transport_sas May 01 23:43:23 fir-md1-s1 kernel: [last unloaded: libcfs] May 01 23:43:23 fir-md1-s1 kernel: May 01 23:43:23 fir-md1-s1 kernel: CPU: 15 PID: 103125 Comm: mdt_io03_043 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 May 01 23:43:23 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 May 01 23:43:23 fir-md1-s1 kernel: task: ffff985912f64100 ti: ffff9858407d0000 task.ti: ffff9858407d0000 May 01 23:43:23 fir-md1-s1 kernel: RIP: 0010:[] May 01 23:43:23 fir-md1-s1 kernel: [] native_queued_spin_lock_slowpath+0x128/0x200 May 01 23:43:23 fir-md1-s1 kernel: RSP: 0018:ffff9858407d38e8 EFLAGS: 00000246 May 01 23:43:23 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffff984851f7a378 RCX: 0000000000790000 May 01 23:43:23 fir-md1-s1 kernel: RDX: ffff982cff01b780 RSI: 0000000001010101 RDI: ffff982c9fc8c480 May 01 23:43:23 fir-md1-s1 kernel: RBP: ffff9858407d38e8 R08: ffff985d3f4db780 R09: 0000000000000000 May 01 23:43:23 fir-md1-s1 kernel: R10: 0000000000000000 R11: ffff985c8b319038 R12: ffff985d3f55ac00 May 01 23:43:23 fir-md1-s1 kernel: R13: ffff985912f64168 R14: 00ff9858407d3850 R15: ffff984cf34000a0 May 01 23:43:23 fir-md1-s1 kernel: FS: 00007f63c1c68740(0000) GS:ffff985d3f4c0000(0000) knlGS:0000000000000000 May 01 23:43:23 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 01 23:43:23 fir-md1-s1 kernel: CR2: 00007ff884d9fd1c CR3: 00000012f7610000 CR4: 00000000003407e0 May 01 23:43:23 fir-md1-s1 kernel: Call Trace: May 01 23:43:23 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf May 01 23:43:23 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 May 01 23:43:23 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] May 01 23:43:23 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x210/0x700 [ldiskfs] May 01 23:43:23 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 May 01 23:43:23 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] May 01 23:43:23 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] May 01 23:43:23 fir-md1-s1 kernel: [] mdt_obd_preprw+0x637/0x1060 [mdt] May 01 23:43:23 fir-md1-s1 kernel: [] tgt_brw_write+0xc7e/0x1a90 [ptlrpc] May 01 23:43:23 fir-md1-s1 kernel: [] ? load_balance+0x178/0x9a0 May 01 23:43:23 fir-md1-s1 kernel: [] ? update_curr+0x14c/0x1e0 May 01 23:43:23 fir-md1-s1 kernel: [] ? account_entity_dequeue+0xae/0xd0 May 01 23:43:23 fir-md1-s1 kernel: [] ? __enqueue_entity+0x78/0x80 May 01 23:43:23 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] May 01 23:43:23 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] May 01 23:43:23 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] May 01 23:43:23 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] May 01 23:43:23 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 01 23:43:23 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] May 01 23:43:23 fir-md1-s1 kernel: [] ? default_wake_function+0x12/0x20 May 01 23:43:23 fir-md1-s1 kernel: [] ? __wake_up_common+0x5b/0x90 May 01 23:43:23 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] May 01 23:43:23 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] May 01 23:43:23 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 May 01 23:43:23 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:43:23 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 May 01 23:43:23 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:43:23 fir-md1-s1 kernel: Code: May 01 23:43:23 fir-md1-s1 kernel: 98 May 01 23:43:23 fir-md1-s1 kernel: 83 May 01 23:43:23 fir-md1-s1 kernel: e2 May 01 23:43:23 fir-md1-s1 kernel: 30 May 01 23:43:23 fir-md1-s1 kernel: 48 May 01 23:43:23 fir-md1-s1 kernel: 81 May 01 23:43:23 fir-md1-s1 kernel: c2 May 01 23:43:23 fir-md1-s1 kernel: 80 May 01 23:43:23 fir-md1-s1 kernel: b7 May 01 23:43:23 fir-md1-s1 kernel: 01 May 01 23:43:23 fir-md1-s1 kernel: 00 May 01 23:43:23 fir-md1-s1 kernel: 48 May 01 23:43:23 fir-md1-s1 kernel: 03 May 01 23:43:23 fir-md1-s1 kernel: 14 May 01 23:43:23 fir-md1-s1 kernel: c5 May 01 23:43:23 fir-md1-s1 kernel: 60 May 01 23:43:23 fir-md1-s1 kernel: b9 May 01 23:43:23 fir-md1-s1 kernel: b4 May 01 23:43:23 fir-md1-s1 kernel: b7 May 01 23:43:23 fir-md1-s1 kernel: 4c May 01 23:43:23 fir-md1-s1 kernel: 89 May 01 23:43:23 fir-md1-s1 kernel: 02 May 01 23:43:23 fir-md1-s1 kernel: 41 May 01 23:43:23 fir-md1-s1 kernel: 8b May 01 23:43:23 fir-md1-s1 kernel: 40 May 01 23:43:23 fir-md1-s1 kernel: 08 May 01 23:43:23 fir-md1-s1 kernel: 85 May 01 23:43:23 fir-md1-s1 kernel: c0 May 01 23:43:23 fir-md1-s1 kernel: 75 May 01 23:43:23 fir-md1-s1 kernel: 0f May 01 23:43:23 fir-md1-s1 kernel: 0f May 01 23:43:23 fir-md1-s1 kernel: 1f May 01 23:43:23 fir-md1-s1 kernel: 44 May 01 23:43:23 fir-md1-s1 kernel: 00 May 01 23:43:23 fir-md1-s1 kernel: 00 May 01 23:43:23 fir-md1-s1 kernel: f3 May 01 23:43:23 fir-md1-s1 kernel: 90 May 01 23:43:23 fir-md1-s1 kernel: 41 May 01 23:43:23 fir-md1-s1 kernel: 8b May 01 23:43:23 fir-md1-s1 kernel: 40 May 01 23:43:23 fir-md1-s1 kernel: 08 May 01 23:43:23 fir-md1-s1 kernel: 85 May 01 23:43:23 fir-md1-s1 kernel: c0 May 01 23:43:23 fir-md1-s1 kernel: <74> May 01 23:43:23 fir-md1-s1 kernel: f6 May 01 23:43:23 fir-md1-s1 kernel: 4d May 01 23:43:23 fir-md1-s1 kernel: 8b May 01 23:43:23 fir-md1-s1 kernel: 08 May 01 23:43:23 fir-md1-s1 kernel: 4d May 01 23:43:23 fir-md1-s1 kernel: 85 May 01 23:43:23 fir-md1-s1 kernel: c9 May 01 23:43:23 fir-md1-s1 kernel: 74 May 01 23:43:23 fir-md1-s1 kernel: 04 May 01 23:43:23 fir-md1-s1 kernel: 41 May 01 23:43:23 fir-md1-s1 kernel: 0f May 01 23:43:23 fir-md1-s1 kernel: 18 May 01 23:43:23 fir-md1-s1 kernel: 09 May 01 23:43:23 fir-md1-s1 kernel: 8b May 01 23:43:23 fir-md1-s1 kernel: 17 May 01 23:43:23 fir-md1-s1 kernel: 0f May 01 23:43:23 fir-md1-s1 kernel: b7 May 01 23:43:23 fir-md1-s1 kernel: c2 May 01 23:43:23 fir-md1-s1 kernel: 85 May 01 23:43:23 fir-md1-s1 kernel: c0 May 01 23:43:23 fir-md1-s1 kernel: May 01 23:43:23 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#21 stuck for 23s! [mdt_io01_085:103094] May 01 23:43:23 fir-md1-s1 kernel: Modules linked in: May 01 23:43:23 fir-md1-s1 kernel: osp(OE) May 01 23:43:23 fir-md1-s1 kernel: mdd(OE) May 01 23:43:23 fir-md1-s1 kernel: lod(OE) May 01 23:43:23 fir-md1-s1 kernel: mdt(OE) May 01 23:43:23 fir-md1-s1 kernel: lfsck(OE) May 01 23:43:23 fir-md1-s1 kernel: mgs(OE) May 01 23:43:23 fir-md1-s1 kernel: mgc(OE) May 01 23:43:23 fir-md1-s1 kernel: osd_ldiskfs(OE) May 01 23:43:23 fir-md1-s1 kernel: lquota(OE) May 01 23:43:23 fir-md1-s1 kernel: ldiskfs(OE) May 01 23:43:23 fir-md1-s1 kernel: lustre(OE) May 01 23:43:23 fir-md1-s1 kernel: lmv(OE) May 01 23:43:23 fir-md1-s1 kernel: mdc(OE) May 01 23:43:23 fir-md1-s1 kernel: osc(OE) May 01 23:43:23 fir-md1-s1 kernel: lov(OE) May 01 23:43:23 fir-md1-s1 kernel: fid(OE) May 01 23:43:23 fir-md1-s1 kernel: fld(OE) May 01 23:43:23 fir-md1-s1 kernel: ko2iblnd(OE) May 01 23:43:23 fir-md1-s1 kernel: ptlrpc(OE) May 01 23:43:23 fir-md1-s1 kernel: obdclass(OE) May 01 23:43:23 fir-md1-s1 kernel: lnet(OE) May 01 23:43:23 fir-md1-s1 kernel: libcfs(OE) May 01 23:43:23 fir-md1-s1 kernel: rpcsec_gss_krb5 May 01 23:43:23 fir-md1-s1 kernel: auth_rpcgss May 01 23:43:23 fir-md1-s1 kernel: nfsv4 May 01 23:43:23 fir-md1-s1 kernel: dns_resolver May 01 23:43:23 fir-md1-s1 kernel: nfs May 01 23:43:23 fir-md1-s1 kernel: lockd May 01 23:43:23 fir-md1-s1 kernel: grace May 01 23:43:23 fir-md1-s1 kernel: fscache May 01 23:43:23 fir-md1-s1 kernel: rdma_ucm(OE) May 01 23:43:23 fir-md1-s1 kernel: ib_ucm(OE) May 01 23:43:23 fir-md1-s1 kernel: rdma_cm(OE) May 01 23:43:23 fir-md1-s1 kernel: iw_cm(OE) May 01 23:43:23 fir-md1-s1 kernel: ib_ipoib(OE) May 01 23:43:23 fir-md1-s1 kernel: ib_cm(OE) May 01 23:43:23 fir-md1-s1 kernel: ib_umad(OE) May 01 23:43:23 fir-md1-s1 kernel: mlx5_fpga_tools(OE) May 01 23:43:23 fir-md1-s1 kernel: mlx4_en(OE) May 01 23:43:23 fir-md1-s1 kernel: mlx4_ib(OE) May 01 23:43:23 fir-md1-s1 kernel: mlx4_core(OE) May 01 23:43:23 fir-md1-s1 kernel: dell_rbu May 01 23:43:23 fir-md1-s1 kernel: sunrpc May 01 23:43:23 fir-md1-s1 kernel: vfat May 01 23:43:23 fir-md1-s1 kernel: fat May 01 23:43:23 fir-md1-s1 kernel: dm_round_robin May 01 23:43:23 fir-md1-s1 kernel: amd64_edac_mod May 01 23:43:23 fir-md1-s1 kernel: edac_mce_amd May 01 23:43:23 fir-md1-s1 kernel: kvm_amd May 01 23:43:23 fir-md1-s1 kernel: kvm May 01 23:43:23 fir-md1-s1 kernel: ses May 01 23:43:23 fir-md1-s1 kernel: irqbypass May 01 23:43:23 fir-md1-s1 kernel: crc32_pclmul May 01 23:43:23 fir-md1-s1 kernel: enclosure May 01 23:43:23 fir-md1-s1 kernel: ghash_clmulni_intel May 01 23:43:23 fir-md1-s1 kernel: dcdbas May 01 23:43:23 fir-md1-s1 kernel: aesni_intel May 01 23:43:23 fir-md1-s1 kernel: lrw May 01 23:43:23 fir-md1-s1 kernel: gf128mul May 01 23:43:23 fir-md1-s1 kernel: glue_helper May 01 23:43:23 fir-md1-s1 kernel: ablk_helper May 01 23:43:23 fir-md1-s1 kernel: cryptd May 01 23:43:23 fir-md1-s1 kernel: ipmi_si May 01 23:43:23 fir-md1-s1 kernel: pcspkr May 01 23:43:23 fir-md1-s1 kernel: ipmi_devintf May 01 23:43:23 fir-md1-s1 kernel: ccp May 01 23:43:23 fir-md1-s1 kernel: i2c_piix4 May 01 23:43:23 fir-md1-s1 kernel: dm_multipath May 01 23:43:23 fir-md1-s1 kernel: sg May 01 23:43:23 fir-md1-s1 kernel: k10temp May 01 23:43:23 fir-md1-s1 kernel: ipmi_msghandler May 01 23:43:23 fir-md1-s1 kernel: dm_mod May 01 23:43:23 fir-md1-s1 kernel: acpi_power_meter May 01 23:43:23 fir-md1-s1 kernel: knem(OE) May 01 23:43:23 fir-md1-s1 kernel: ip_tables May 01 23:43:23 fir-md1-s1 kernel: ext4 May 01 23:43:23 fir-md1-s1 kernel: mbcache May 01 23:43:23 fir-md1-s1 kernel: jbd2 May 01 23:43:23 fir-md1-s1 kernel: sd_mod May 01 23:43:23 fir-md1-s1 kernel: crc_t10dif May 01 23:43:23 fir-md1-s1 kernel: crct10dif_generic May 01 23:43:23 fir-md1-s1 kernel: mlx5_ib(OE) May 01 23:43:23 fir-md1-s1 kernel: ib_uverbs(OE) May 01 23:43:23 fir-md1-s1 kernel: ib_core(OE) May 01 23:43:23 fir-md1-s1 kernel: i2c_algo_bit May 01 23:43:23 fir-md1-s1 kernel: drm_kms_helper May 01 23:43:23 fir-md1-s1 kernel: mlx5_core(OE) May 01 23:43:23 fir-md1-s1 kernel: syscopyarea May 01 23:43:23 fir-md1-s1 kernel: sysfillrect May 01 23:43:23 fir-md1-s1 kernel: sysimgblt May 01 23:43:23 fir-md1-s1 kernel: fb_sys_fops May 01 23:43:23 fir-md1-s1 kernel: mlxfw(OE) May 01 23:43:23 fir-md1-s1 kernel: crct10dif_pclmul May 01 23:43:23 fir-md1-s1 kernel: ttm May 01 23:43:23 fir-md1-s1 kernel: devlink May 01 23:43:23 fir-md1-s1 kernel: ahci May 01 23:43:23 fir-md1-s1 kernel: crct10dif_common May 01 23:43:23 fir-md1-s1 kernel: libahci May 01 23:43:23 fir-md1-s1 kernel: drm May 01 23:43:23 fir-md1-s1 kernel: mlx_compat(OE) May 01 23:43:23 fir-md1-s1 kernel: tg3 May 01 23:43:23 fir-md1-s1 kernel: crc32c_intel May 01 23:43:23 fir-md1-s1 kernel: libata May 01 23:43:23 fir-md1-s1 kernel: megaraid_sas May 01 23:43:23 fir-md1-s1 kernel: drm_panel_orientation_quirks May 01 23:43:23 fir-md1-s1 kernel: ptp May 01 23:43:23 fir-md1-s1 kernel: pps_core May 01 23:43:23 fir-md1-s1 kernel: mpt3sas(OE) May 01 23:43:23 fir-md1-s1 kernel: raid_class May 01 23:43:23 fir-md1-s1 kernel: scsi_transport_sas May 01 23:43:23 fir-md1-s1 kernel: [last unloaded: libcfs] May 01 23:43:23 fir-md1-s1 kernel: May 01 23:43:23 fir-md1-s1 kernel: CPU: 21 PID: 103094 Comm: mdt_io01_085 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 May 01 23:43:23 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 May 01 23:43:23 fir-md1-s1 kernel: task: ffff982c9ff3c100 ti: ffff98596d728000 task.ti: ffff98596d728000 May 01 23:43:23 fir-md1-s1 kernel: RIP: 0010:[] May 01 23:43:23 fir-md1-s1 kernel: [] native_queued_spin_lock_slowpath+0x122/0x200 May 01 23:43:23 fir-md1-s1 kernel: RSP: 0018:ffff98596d72b8e8 EFLAGS: 00000246 May 01 23:43:23 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffff98267ca42b78 RCX: 0000000000a90000 May 01 23:43:23 fir-md1-s1 kernel: RDX: ffff984cff79b780 RSI: 0000000000d10101 RDI: ffff982c9fc8c480 May 01 23:43:23 fir-md1-s1 kernel: RBP: ffff98596d72b8e8 R08: ffff983cff75b780 R09: 0000000000000000 May 01 23:43:23 fir-md1-s1 kernel: R10: 0000000000000000 R11: ffff985c8b319038 R12: ffff983cff65ac00 May 01 23:43:23 fir-md1-s1 kernel: R13: ffff982c9ff3c168 R14: 00ffffffb7a08d80 R15: ffff984cf34000a0 May 01 23:43:23 fir-md1-s1 kernel: FS: 00007fad68bc1880(0000) GS:ffff983cff740000(0000) knlGS:0000000000000000 May 01 23:43:23 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 01 23:43:23 fir-md1-s1 kernel: CR2: 00007ffcc3425f98 CR3: 00000012f7610000 CR4: 00000000003407e0 May 01 23:43:23 fir-md1-s1 kernel: Call Trace: May 01 23:43:23 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf May 01 23:43:23 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 May 01 23:43:23 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] May 01 23:43:23 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x210/0x700 [ldiskfs] May 01 23:43:23 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 May 01 23:43:23 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] May 01 23:43:23 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] May 01 23:43:23 fir-md1-s1 kernel: [] mdt_obd_preprw+0x637/0x1060 [mdt] May 01 23:43:23 fir-md1-s1 kernel: [] tgt_brw_write+0xc7e/0x1a90 [ptlrpc] May 01 23:43:23 fir-md1-s1 kernel: [] ? lustre_msg_buf_v2+0x1b0/0x1b0 [ptlrpc] May 01 23:43:23 fir-md1-s1 kernel: [] ? lustre_msg_buf+0x17/0x60 [ptlrpc] May 01 23:43:23 fir-md1-s1 kernel: [] ? update_curr+0x14c/0x1e0 May 01 23:43:23 fir-md1-s1 kernel: [] ? account_entity_dequeue+0xae/0xd0 May 01 23:43:23 fir-md1-s1 kernel: [] ? __enqueue_entity+0x78/0x80 May 01 23:43:23 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] May 01 23:43:23 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] May 01 23:43:23 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] May 01 23:43:23 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] May 01 23:43:23 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 01 23:43:23 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] May 01 23:43:23 fir-md1-s1 kernel: [] ? default_wake_function+0x12/0x20 May 01 23:43:23 fir-md1-s1 kernel: [] ? __wake_up_common+0x5b/0x90 May 01 23:43:23 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] May 01 23:43:23 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] May 01 23:43:23 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 May 01 23:43:23 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:43:23 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 May 01 23:43:23 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:43:23 fir-md1-s1 kernel: Code: May 01 23:43:23 fir-md1-s1 kernel: 13 May 01 23:43:23 fir-md1-s1 kernel: 48 May 01 23:43:23 fir-md1-s1 kernel: c1 May 01 23:43:23 fir-md1-s1 kernel: ea May 01 23:43:23 fir-md1-s1 kernel: 0d May 01 23:43:23 fir-md1-s1 kernel: 48 May 01 23:43:23 fir-md1-s1 kernel: 98 May 01 23:43:23 fir-md1-s1 kernel: 83 May 01 23:43:23 fir-md1-s1 kernel: e2 May 01 23:43:23 fir-md1-s1 kernel: 30 May 01 23:43:23 fir-md1-s1 kernel: 48 May 01 23:43:23 fir-md1-s1 kernel: 81 May 01 23:43:23 fir-md1-s1 kernel: c2 May 01 23:43:23 fir-md1-s1 kernel: 80 May 01 23:43:23 fir-md1-s1 kernel: b7 May 01 23:43:23 fir-md1-s1 kernel: 01 May 01 23:43:23 fir-md1-s1 kernel: 00 May 01 23:43:23 fir-md1-s1 kernel: 48 May 01 23:43:23 fir-md1-s1 kernel: 03 May 01 23:43:23 fir-md1-s1 kernel: 14 May 01 23:43:23 fir-md1-s1 kernel: c5 May 01 23:43:23 fir-md1-s1 kernel: 60 May 01 23:43:23 fir-md1-s1 kernel: b9 May 01 23:43:23 fir-md1-s1 kernel: b4 May 01 23:43:23 fir-md1-s1 kernel: b7 May 01 23:43:23 fir-md1-s1 kernel: 4c May 01 23:43:23 fir-md1-s1 kernel: 89 May 01 23:43:23 fir-md1-s1 kernel: 02 May 01 23:43:23 fir-md1-s1 kernel: 41 May 01 23:43:23 fir-md1-s1 kernel: 8b May 01 23:43:23 fir-md1-s1 kernel: 40 May 01 23:43:23 fir-md1-s1 kernel: 08 May 01 23:43:23 fir-md1-s1 kernel: 85 May 01 23:43:23 fir-md1-s1 kernel: c0 May 01 23:43:23 fir-md1-s1 kernel: 75 May 01 23:43:23 fir-md1-s1 kernel: 0f May 01 23:43:23 fir-md1-s1 kernel: 0f May 01 23:43:23 fir-md1-s1 kernel: 1f May 01 23:43:23 fir-md1-s1 kernel: 44 May 01 23:43:23 fir-md1-s1 kernel: 00 May 01 23:43:23 fir-md1-s1 kernel: 00 May 01 23:43:23 fir-md1-s1 kernel: f3 May 01 23:43:23 fir-md1-s1 kernel: 90 May 01 23:43:23 fir-md1-s1 kernel: <41> May 01 23:43:23 fir-md1-s1 kernel: 8b May 01 23:43:23 fir-md1-s1 kernel: 40 May 01 23:43:23 fir-md1-s1 kernel: 08 May 01 23:43:23 fir-md1-s1 kernel: 85 May 01 23:43:23 fir-md1-s1 kernel: c0 May 01 23:43:23 fir-md1-s1 kernel: 74 May 01 23:43:23 fir-md1-s1 kernel: f6 May 01 23:43:23 fir-md1-s1 kernel: 4d May 01 23:43:23 fir-md1-s1 kernel: 8b May 01 23:43:23 fir-md1-s1 kernel: 08 May 01 23:43:23 fir-md1-s1 kernel: 4d May 01 23:43:23 fir-md1-s1 kernel: 85 May 01 23:43:23 fir-md1-s1 kernel: c9 May 01 23:43:23 fir-md1-s1 kernel: 74 May 01 23:43:23 fir-md1-s1 kernel: 04 May 01 23:43:23 fir-md1-s1 kernel: 41 May 01 23:43:23 fir-md1-s1 kernel: 0f May 01 23:43:23 fir-md1-s1 kernel: 18 May 01 23:43:23 fir-md1-s1 kernel: 09 May 01 23:43:23 fir-md1-s1 kernel: 8b May 01 23:43:23 fir-md1-s1 kernel: May 01 23:43:23 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#32 stuck for 23s! [mdt_io00_072:103262] May 01 23:43:23 fir-md1-s1 kernel: Modules linked in: May 01 23:43:23 fir-md1-s1 kernel: osp(OE) May 01 23:43:23 fir-md1-s1 kernel: mdd(OE) May 01 23:43:23 fir-md1-s1 kernel: lod(OE) May 01 23:43:23 fir-md1-s1 kernel: mdt(OE) May 01 23:43:23 fir-md1-s1 kernel: lfsck(OE) May 01 23:43:23 fir-md1-s1 kernel: mgs(OE) May 01 23:43:23 fir-md1-s1 kernel: mgc(OE) May 01 23:43:23 fir-md1-s1 kernel: osd_ldiskfs(OE) May 01 23:43:23 fir-md1-s1 kernel: lquota(OE) May 01 23:43:23 fir-md1-s1 kernel: ldiskfs(OE) May 01 23:43:23 fir-md1-s1 kernel: lustre(OE) May 01 23:43:23 fir-md1-s1 kernel: lmv(OE) May 01 23:43:23 fir-md1-s1 kernel: mdc(OE) May 01 23:43:23 fir-md1-s1 kernel: osc(OE) May 01 23:43:23 fir-md1-s1 kernel: lov(OE) May 01 23:43:23 fir-md1-s1 kernel: fid(OE) May 01 23:43:23 fir-md1-s1 kernel: fld(OE) May 01 23:43:23 fir-md1-s1 kernel: ko2iblnd(OE) May 01 23:43:23 fir-md1-s1 kernel: ptlrpc(OE) May 01 23:43:23 fir-md1-s1 kernel: obdclass(OE) May 01 23:43:23 fir-md1-s1 kernel: lnet(OE) May 01 23:43:23 fir-md1-s1 kernel: libcfs(OE) May 01 23:43:23 fir-md1-s1 kernel: rpcsec_gss_krb5 May 01 23:43:23 fir-md1-s1 kernel: auth_rpcgss May 01 23:43:23 fir-md1-s1 kernel: nfsv4 May 01 23:43:23 fir-md1-s1 kernel: dns_resolver May 01 23:43:23 fir-md1-s1 kernel: nfs May 01 23:43:23 fir-md1-s1 kernel: lockd May 01 23:43:23 fir-md1-s1 kernel: grace May 01 23:43:23 fir-md1-s1 kernel: fscache May 01 23:43:23 fir-md1-s1 kernel: rdma_ucm(OE) May 01 23:43:23 fir-md1-s1 kernel: ib_ucm(OE) May 01 23:43:23 fir-md1-s1 kernel: rdma_cm(OE) May 01 23:43:23 fir-md1-s1 kernel: iw_cm(OE) May 01 23:43:23 fir-md1-s1 kernel: ib_ipoib(OE) May 01 23:43:23 fir-md1-s1 kernel: ib_cm(OE) May 01 23:43:23 fir-md1-s1 kernel: ib_umad(OE) May 01 23:43:23 fir-md1-s1 kernel: mlx5_fpga_tools(OE) May 01 23:43:23 fir-md1-s1 kernel: mlx4_en(OE) May 01 23:43:23 fir-md1-s1 kernel: mlx4_ib(OE) May 01 23:43:23 fir-md1-s1 kernel: mlx4_core(OE) May 01 23:43:23 fir-md1-s1 kernel: dell_rbu May 01 23:43:23 fir-md1-s1 kernel: sunrpc May 01 23:43:23 fir-md1-s1 kernel: vfat May 01 23:43:23 fir-md1-s1 kernel: fat May 01 23:43:23 fir-md1-s1 kernel: dm_round_robin May 01 23:43:23 fir-md1-s1 kernel: amd64_edac_mod May 01 23:43:23 fir-md1-s1 kernel: edac_mce_amd May 01 23:43:23 fir-md1-s1 kernel: kvm_amd May 01 23:43:23 fir-md1-s1 kernel: kvm May 01 23:43:23 fir-md1-s1 kernel: ses May 01 23:43:23 fir-md1-s1 kernel: irqbypass May 01 23:43:23 fir-md1-s1 kernel: crc32_pclmul May 01 23:43:23 fir-md1-s1 kernel: enclosure May 01 23:43:23 fir-md1-s1 kernel: ghash_clmulni_intel May 01 23:43:23 fir-md1-s1 kernel: dcdbas May 01 23:43:23 fir-md1-s1 kernel: aesni_intel May 01 23:43:23 fir-md1-s1 kernel: lrw May 01 23:43:23 fir-md1-s1 kernel: gf128mul May 01 23:43:23 fir-md1-s1 kernel: glue_helper May 01 23:43:23 fir-md1-s1 kernel: ablk_helper May 01 23:43:23 fir-md1-s1 kernel: cryptd May 01 23:43:23 fir-md1-s1 kernel: ipmi_si May 01 23:43:23 fir-md1-s1 kernel: pcspkr May 01 23:43:23 fir-md1-s1 kernel: ipmi_devintf May 01 23:43:23 fir-md1-s1 kernel: ccp May 01 23:43:23 fir-md1-s1 kernel: i2c_piix4 May 01 23:43:23 fir-md1-s1 kernel: dm_multipath May 01 23:43:23 fir-md1-s1 kernel: sg May 01 23:43:23 fir-md1-s1 kernel: k10temp May 01 23:43:23 fir-md1-s1 kernel: ipmi_msghandler May 01 23:43:23 fir-md1-s1 kernel: dm_mod May 01 23:43:23 fir-md1-s1 kernel: acpi_power_meter May 01 23:43:23 fir-md1-s1 kernel: knem(OE) May 01 23:43:23 fir-md1-s1 kernel: ip_tables May 01 23:43:23 fir-md1-s1 kernel: ext4 May 01 23:43:23 fir-md1-s1 kernel: mbcache May 01 23:43:23 fir-md1-s1 kernel: jbd2 May 01 23:43:23 fir-md1-s1 kernel: sd_mod May 01 23:43:23 fir-md1-s1 kernel: crc_t10dif May 01 23:43:23 fir-md1-s1 kernel: crct10dif_generic May 01 23:43:23 fir-md1-s1 kernel: mlx5_ib(OE) May 01 23:43:23 fir-md1-s1 kernel: ib_uverbs(OE) May 01 23:43:23 fir-md1-s1 kernel: ib_core(OE) May 01 23:43:23 fir-md1-s1 kernel: i2c_algo_bit May 01 23:43:23 fir-md1-s1 kernel: drm_kms_helper May 01 23:43:23 fir-md1-s1 kernel: mlx5_core(OE) May 01 23:43:23 fir-md1-s1 kernel: syscopyarea May 01 23:43:23 fir-md1-s1 kernel: sysfillrect May 01 23:43:23 fir-md1-s1 kernel: sysimgblt May 01 23:43:23 fir-md1-s1 kernel: fb_sys_fops May 01 23:43:23 fir-md1-s1 kernel: mlxfw(OE) May 01 23:43:23 fir-md1-s1 kernel: crct10dif_pclmul May 01 23:43:23 fir-md1-s1 kernel: ttm May 01 23:43:23 fir-md1-s1 kernel: devlink May 01 23:43:23 fir-md1-s1 kernel: ahci May 01 23:43:23 fir-md1-s1 kernel: crct10dif_common May 01 23:43:23 fir-md1-s1 kernel: libahci May 01 23:43:23 fir-md1-s1 kernel: drm May 01 23:43:23 fir-md1-s1 kernel: mlx_compat(OE) May 01 23:43:23 fir-md1-s1 kernel: tg3 May 01 23:43:23 fir-md1-s1 kernel: crc32c_intel May 01 23:43:23 fir-md1-s1 kernel: libata May 01 23:43:23 fir-md1-s1 kernel: megaraid_sas May 01 23:43:23 fir-md1-s1 kernel: drm_panel_orientation_quirks May 01 23:43:23 fir-md1-s1 kernel: ptp May 01 23:43:23 fir-md1-s1 kernel: pps_core May 01 23:43:23 fir-md1-s1 kernel: mpt3sas(OE) May 01 23:43:23 fir-md1-s1 kernel: raid_class May 01 23:43:23 fir-md1-s1 kernel: scsi_transport_sas May 01 23:43:23 fir-md1-s1 kernel: [last unloaded: libcfs] May 01 23:43:23 fir-md1-s1 kernel: May 01 23:43:23 fir-md1-s1 kernel: CPU: 32 PID: 103262 Comm: mdt_io00_072 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 May 01 23:43:23 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 May 01 23:43:23 fir-md1-s1 kernel: task: ffff984cba239040 ti: ffff982bc9ed4000 task.ti: ffff982bc9ed4000 May 01 23:43:23 fir-md1-s1 kernel: RIP: 0010:[] May 01 23:43:23 fir-md1-s1 kernel: [] native_queued_spin_lock_slowpath+0x122/0x200 May 01 23:43:23 fir-md1-s1 kernel: RSP: 0018:ffff982bc9ed78e8 EFLAGS: 00000246 May 01 23:43:23 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffff9837d662bb78 RCX: 0000000001010000 May 01 23:43:23 fir-md1-s1 kernel: RDX: ffff983cff75b780 RSI: 0000000000a90101 RDI: ffff982c9fc8c480 May 01 23:43:23 fir-md1-s1 kernel: RBP: ffff982bc9ed78e8 R08: ffff982cff01b780 R09: 0000000000000000 May 01 23:43:23 fir-md1-s1 kernel: R10: 0000000000000000 R11: ffff985c8b319038 R12: ffff982cfef9ac00 May 01 23:43:23 fir-md1-s1 kernel: R13: ffff984cba2390a8 R14: 00ffffffb7a08d80 R15: ffff984cf34000a0 May 01 23:43:23 fir-md1-s1 kernel: FS: 00007f1fd5eaa700(0000) GS:ffff982cff000000(0000) knlGS:0000000000000000 May 01 23:43:23 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 01 23:43:23 fir-md1-s1 kernel: CR2: 00007f41ebfcd1b0 CR3: 0000001038b88000 CR4: 00000000003407e0 May 01 23:43:23 fir-md1-s1 kernel: Call Trace: May 01 23:43:23 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf May 01 23:43:23 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 May 01 23:43:23 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] May 01 23:43:23 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x210/0x700 [ldiskfs] May 01 23:43:23 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 May 01 23:43:23 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] May 01 23:43:23 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] May 01 23:43:23 fir-md1-s1 kernel: [] mdt_obd_preprw+0x637/0x1060 [mdt] May 01 23:43:23 fir-md1-s1 kernel: [] tgt_brw_write+0xc7e/0x1a90 [ptlrpc] May 01 23:43:23 fir-md1-s1 kernel: [] ? load_balance+0x178/0x9a0 May 01 23:43:23 fir-md1-s1 kernel: [] ? update_curr+0x14c/0x1e0 May 01 23:43:23 fir-md1-s1 kernel: [] ? account_entity_dequeue+0xae/0xd0 May 01 23:43:23 fir-md1-s1 kernel: [] ? __enqueue_entity+0x78/0x80 May 01 23:43:23 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] May 01 23:43:23 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] May 01 23:43:23 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] May 01 23:43:23 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] May 01 23:43:23 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 01 23:43:23 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] May 01 23:43:23 fir-md1-s1 kernel: [] ? default_wake_function+0x12/0x20 May 01 23:43:23 fir-md1-s1 kernel: [] ? __wake_up_common+0x5b/0x90 May 01 23:43:23 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] May 01 23:43:23 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] May 01 23:43:23 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 May 01 23:43:23 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:43:23 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 May 01 23:43:23 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:43:23 fir-md1-s1 kernel: Code: May 01 23:43:23 fir-md1-s1 kernel: 13 May 01 23:43:23 fir-md1-s1 kernel: 48 May 01 23:43:23 fir-md1-s1 kernel: c1 May 01 23:43:23 fir-md1-s1 kernel: ea May 01 23:43:23 fir-md1-s1 kernel: 0d May 01 23:43:23 fir-md1-s1 kernel: 48 May 01 23:43:23 fir-md1-s1 kernel: 98 May 01 23:43:23 fir-md1-s1 kernel: 83 May 01 23:43:23 fir-md1-s1 kernel: e2 May 01 23:43:23 fir-md1-s1 kernel: 30 May 01 23:43:23 fir-md1-s1 kernel: 48 May 01 23:43:23 fir-md1-s1 kernel: 81 May 01 23:43:23 fir-md1-s1 kernel: c2 May 01 23:43:23 fir-md1-s1 kernel: 80 May 01 23:43:23 fir-md1-s1 kernel: b7 May 01 23:43:23 fir-md1-s1 kernel: 01 May 01 23:43:23 fir-md1-s1 kernel: 00 May 01 23:43:23 fir-md1-s1 kernel: 48 May 01 23:43:23 fir-md1-s1 kernel: 03 May 01 23:43:23 fir-md1-s1 kernel: 14 May 01 23:43:23 fir-md1-s1 kernel: c5 May 01 23:43:23 fir-md1-s1 kernel: 60 May 01 23:43:23 fir-md1-s1 kernel: b9 May 01 23:43:23 fir-md1-s1 kernel: b4 May 01 23:43:23 fir-md1-s1 kernel: b7 May 01 23:43:23 fir-md1-s1 kernel: 4c May 01 23:43:23 fir-md1-s1 kernel: 89 May 01 23:43:23 fir-md1-s1 kernel: 02 May 01 23:43:23 fir-md1-s1 kernel: 41 May 01 23:43:23 fir-md1-s1 kernel: 8b May 01 23:43:23 fir-md1-s1 kernel: 40 May 01 23:43:23 fir-md1-s1 kernel: 08 May 01 23:43:23 fir-md1-s1 kernel: 85 May 01 23:43:23 fir-md1-s1 kernel: c0 May 01 23:43:23 fir-md1-s1 kernel: 75 May 01 23:43:23 fir-md1-s1 kernel: 0f May 01 23:43:23 fir-md1-s1 kernel: 0f May 01 23:43:23 fir-md1-s1 kernel: 1f May 01 23:43:23 fir-md1-s1 kernel: 44 May 01 23:43:23 fir-md1-s1 kernel: 00 May 01 23:43:23 fir-md1-s1 kernel: 00 May 01 23:43:23 fir-md1-s1 kernel: f3 May 01 23:43:23 fir-md1-s1 kernel: 90 May 01 23:43:23 fir-md1-s1 kernel: <41> May 01 23:43:23 fir-md1-s1 kernel: 8b May 01 23:43:23 fir-md1-s1 kernel: 40 May 01 23:43:23 fir-md1-s1 kernel: 08 May 01 23:43:23 fir-md1-s1 kernel: 85 May 01 23:43:23 fir-md1-s1 kernel: c0 May 01 23:43:23 fir-md1-s1 kernel: 74 May 01 23:43:23 fir-md1-s1 kernel: f6 May 01 23:43:23 fir-md1-s1 kernel: 4d May 01 23:43:23 fir-md1-s1 kernel: 8b May 01 23:43:23 fir-md1-s1 kernel: 08 May 01 23:43:23 fir-md1-s1 kernel: 4d May 01 23:43:23 fir-md1-s1 kernel: 85 May 01 23:43:23 fir-md1-s1 kernel: c9 May 01 23:43:23 fir-md1-s1 kernel: 74 May 01 23:43:23 fir-md1-s1 kernel: 04 May 01 23:43:23 fir-md1-s1 kernel: 41 May 01 23:43:23 fir-md1-s1 kernel: 0f May 01 23:43:23 fir-md1-s1 kernel: 18 May 01 23:43:23 fir-md1-s1 kernel: 09 May 01 23:43:23 fir-md1-s1 kernel: 8b May 01 23:43:23 fir-md1-s1 kernel: May 01 23:43:23 fir-md1-s1 kernel: [last unloaded: libcfs] May 01 23:43:23 fir-md1-s1 kernel: May 01 23:43:23 fir-md1-s1 kernel: CPU: 2 PID: 101733 Comm: mdt_io02_001 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.1.3.el7_lustre.x86_64 #1 May 01 23:43:23 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.6.7 10/29/2018 May 01 23:43:23 fir-md1-s1 kernel: task: ffff982cf0b630c0 ti: ffff984cfa370000 task.ti: ffff984cfa370000 May 01 23:43:23 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 May 01 23:43:23 fir-md1-s1 kernel: RSP: 0018:ffff984cfa3738e8 EFLAGS: 00000246 May 01 23:43:23 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffff985c7a711b78 RCX: 0000000000110000 May 01 23:43:23 fir-md1-s1 kernel: RDX: ffff985d3f4db780 RSI: 0000000000790101 RDI: ffff982c9fc8c480 May 01 23:43:23 fir-md1-s1 kernel: RBP: ffff984cfa3738e8 R08: ffff984cff61b780 R09: 0000000000000000 May 01 23:43:23 fir-md1-s1 kernel: R10: 0000000000000000 R11: ffff985c8b319038 R12: ffff984cff61ac00 May 01 23:43:23 fir-md1-s1 kernel: R13: ffff982cf0b63128 R14: 00ffffffb7a08d80 R15: ffff984cf34000a0 May 01 23:43:23 fir-md1-s1 kernel: FS: 00007f759e3eb740(0000) GS:ffff984cff600000(0000) knlGS:0000000000000000 May 01 23:43:23 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 01 23:43:23 fir-md1-s1 kernel: CR2: 00007f759e3fa000 CR3: 00000012f7610000 CR4: 00000000003407e0 May 01 23:43:23 fir-md1-s1 kernel: Call Trace: May 01 23:43:23 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf May 01 23:43:23 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 May 01 23:43:23 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] May 01 23:43:23 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x210/0x700 [ldiskfs] May 01 23:43:23 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 May 01 23:43:23 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] May 01 23:43:23 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] May 01 23:43:23 fir-md1-s1 kernel: [] mdt_obd_preprw+0x637/0x1060 [mdt] May 01 23:43:23 fir-md1-s1 kernel: [] tgt_brw_write+0xc7e/0x1a90 [ptlrpc] May 01 23:43:23 fir-md1-s1 kernel: [] ? lustre_msg_buf_v2+0x1b0/0x1b0 [ptlrpc] May 01 23:43:23 fir-md1-s1 kernel: [] ? lustre_msg_buf+0x17/0x60 [ptlrpc] May 01 23:43:23 fir-md1-s1 kernel: [] ? update_curr+0x14c/0x1e0 May 01 23:43:23 fir-md1-s1 kernel: [] ? account_entity_dequeue+0xae/0xd0 May 01 23:43:23 fir-md1-s1 kernel: [] ? __enqueue_entity+0x78/0x80 May 01 23:43:23 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] May 01 23:43:23 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] May 01 23:43:23 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] May 01 23:43:23 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] May 01 23:43:23 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 01 23:43:23 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] May 01 23:43:23 fir-md1-s1 kernel: [] ? default_wake_function+0x12/0x20 May 01 23:43:23 fir-md1-s1 kernel: [] ? __wake_up_common+0x5b/0x90 May 01 23:43:23 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] May 01 23:43:23 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] May 01 23:43:23 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 May 01 23:43:23 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:43:23 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 May 01 23:43:23 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 May 01 23:43:23 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 60 b9 b4 b7 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b May 01 23:43:24 fir-md1-s1 kernel: Lustre: 103134:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:58s); client may timeout. req@ffff984b11785c50 x1631585929788768/t306055732550(0) o4->16749711-2a27-479b-83fc-14b2199ba6af@10.9.104.18@o2ib4:26/0 lens 8680/416 e 1 to 0 dl 1556779346 ref 1 fl Complete:/0/0 rc 0/0 May 01 23:43:24 fir-md1-s1 kernel: LustreError: 103039:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.8.24.34@o2ib6: deadline 30:1s ago req@ffff984a87ac4800 x1631778399965216/t0(0) o35->b1ac7951-67b3-5d05-244d-b23c643bc210@10.8.24.34@o2ib6:23/0 lens 392/0 e 0 to 0 dl 1556779403 ref 1 fl Interpret:/0/ffffffff rc 0/-1 May 01 23:43:24 fir-md1-s1 kernel: Lustre: 103134:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1263 previous similar messages May 01 23:43:26 fir-md1-s1 kernel: Lustre: 102928:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556779385/real 1556779385] req@ffff98555c27d400 x1632254604141824/t0(0) o601->fir-MDT0000-lwp-MDT0002@0@lo:23/10 lens 336/336 e 1 to 1 dl 1556779406 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1 May 01 23:43:26 fir-md1-s1 kernel: Lustre: 102928:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 01 23:43:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client f37c3da1-0e56-86e1-dca2-c29b3ae80868 (at 10.9.112.9@o2ib4) reconnecting May 01 23:43:38 fir-md1-s1 kernel: Lustre: Skipped 351 previous similar messages May 01 23:43:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.112.9@o2ib4) May 01 23:43:38 fir-md1-s1 kernel: Lustre: Skipped 354 previous similar messages May 01 23:56:15 fir-md1-s1 kernel: Lustre: DEBUG MARKER: Wed May 1 23:56:15 2019 May 01 23:57:11 fir-md1-s1 kernel: bash (34456): drop_caches: 2 May 02 00:57:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9846712fec00, cur 1556783856 expire 1556783706 last 1556783629 May 02 00:57:36 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 01:04:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 02 01:04:44 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages May 02 01:08:31 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client a183609f-9160-1c96-bd22-c38cef5f6dc6 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984c89518c00, cur 1556784511 expire 1556784361 last 1556784284 May 02 01:08:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 01:08:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client bb5df75e-47d2-6116-c28b-57643841d372 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983523331800, cur 1556784514 expire 1556784364 last 1556784287 May 02 01:08:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client bb5df75e-47d2-6116-c28b-57643841d372 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982abc7f7400, cur 1556784523 expire 1556784373 last 1556784296 May 02 01:09:22 fir-md1-s1 kernel: Lustre: 102475:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 May 02 01:09:22 fir-md1-s1 kernel: Lustre: 102475:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 31 previous similar messages May 02 01:12:31 fir-md1-s1 kernel: Lustre: 102498:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 May 02 01:12:31 fir-md1-s1 kernel: Lustre: 102498:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 68 previous similar messages May 02 01:12:34 fir-md1-s1 kernel: Lustre: 102707:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 May 02 01:12:34 fir-md1-s1 kernel: Lustre: 102707:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 21 previous similar messages May 02 01:15:41 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 02 01:15:41 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 01:19:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 02 01:19:00 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 01:19:21 fir-md1-s1 kernel: Lustre: 102597:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 May 02 01:19:21 fir-md1-s1 kernel: Lustre: 102597:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 13 previous similar messages May 02 01:19:28 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 71707ce4-dd3f-f05e-1770-6071c553d770 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984acc2cdc00, cur 1556785168 expire 1556785018 last 1556784941 May 02 01:19:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client ff37667d-58c6-941f-43b8-fbc08903c9b3 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9847eaa56000, cur 1556785172 expire 1556785022 last 1556784945 May 02 01:19:36 fir-md1-s1 kernel: Lustre: 102388:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 May 02 01:19:36 fir-md1-s1 kernel: Lustre: 102388:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 46 previous similar messages May 02 01:19:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client ff37667d-58c6-941f-43b8-fbc08903c9b3 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983c4afff000, cur 1556785181 expire 1556785031 last 1556784954 May 02 01:25:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client ea0ec5c3-6829-d0ee-e1d8-43c3a991f5ee (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982b59978c00, cur 1556785518 expire 1556785368 last 1556785291 May 02 01:25:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 02 01:25:23 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 01:26:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 90508cea-f23b-ead6-21b6-eb1abb6d51cd (at 10.8.23.14@o2ib6) in 177 seconds. I think it's dead, and I am evicting it. exp ffff985953781000, cur 1556785594 expire 1556785444 last 1556785417 May 02 01:26:34 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 01:27:39 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a9f925c3-c6cc-1827-f5ca-23aeee30bd63 (at 10.8.23.14@o2ib6) May 02 01:27:39 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 01:29:10 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client a5e1a8a9-e2c7-2e1d-57da-7e929a098597 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9846bde12000, cur 1556785750 expire 1556785600 last 1556785523 May 02 01:29:10 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 01:29:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 02 01:29:35 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 01:38:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 8d664866-17c7-7f44-31eb-66dff3b61f71 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9846d2ff9c00, cur 1556786331 expire 1556786181 last 1556786104 May 02 01:38:51 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 01:39:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 02 01:39:09 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 03:08:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client aec68a6a-7cc2-5a83-a3fa-b45b5d00f2f3 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983c53b91800, cur 1556791739 expire 1556791589 last 1556791512 May 02 03:08:59 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 03:09:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 93524e03-763b-9556-15d0-9c57c97e51dd (at 10.8.21.21@o2ib6) May 02 03:09:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 03:19:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 0bd5061e-994e-558c-c820-3ad8bf31cfa8 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9846b87d3400, cur 1556792397 expire 1556792247 last 1556792170 May 02 03:19:57 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 03:20:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 93524e03-763b-9556-15d0-9c57c97e51dd (at 10.8.21.21@o2ib6) May 02 03:20:17 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 03:23:53 fir-md1-s1 kernel: Lustre: 102498:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 03:23:53 fir-md1-s1 kernel: Lustre: 102498:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 136 previous similar messages May 02 03:52:17 fir-md1-s1 kernel: Lustre: 102572:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 03:59:54 fir-md1-s1 kernel: Lustre: 102532:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 03:59:54 fir-md1-s1 kernel: Lustre: 102532:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 21 previous similar messages May 02 04:02:13 fir-md1-s1 kernel: Lustre: 102572:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 04:02:13 fir-md1-s1 kernel: Lustre: 102572:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 12 previous similar messages May 02 04:07:38 fir-md1-s1 kernel: Lustre: 101683:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 04:07:38 fir-md1-s1 kernel: Lustre: 101683:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 13 previous similar messages May 02 04:12:07 fir-md1-s1 kernel: Lustre: 102394:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 04:12:07 fir-md1-s1 kernel: Lustre: 102394:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 15 previous similar messages May 02 04:18:47 fir-md1-s1 kernel: Lustre: 101903:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 04:18:47 fir-md1-s1 kernel: Lustre: 101903:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 9 previous similar messages May 02 04:26:31 fir-md1-s1 kernel: Lustre: 102593:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 04:26:31 fir-md1-s1 kernel: Lustre: 102593:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 15 previous similar messages May 02 04:42:51 fir-md1-s1 kernel: Lustre: 102479:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 04:42:51 fir-md1-s1 kernel: Lustre: 102479:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 4 previous similar messages May 02 04:48:59 fir-md1-s1 kernel: Lustre: 102475:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 04:48:59 fir-md1-s1 kernel: Lustre: 102475:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 16 previous similar messages May 02 04:54:02 fir-md1-s1 kernel: Lustre: 102527:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 04:54:02 fir-md1-s1 kernel: Lustre: 102527:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 32 previous similar messages May 02 05:01:26 fir-md1-s1 kernel: Lustre: 102532:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 05:01:26 fir-md1-s1 kernel: Lustre: 102532:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2 previous similar messages May 02 05:05:59 fir-md1-s1 kernel: Lustre: 102707:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 05:05:59 fir-md1-s1 kernel: Lustre: 102707:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 15 previous similar messages May 02 05:08:58 fir-md1-s1 kernel: Lustre: 101902:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 05:08:58 fir-md1-s1 kernel: Lustre: 101902:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 8 previous similar messages May 02 05:17:07 fir-md1-s1 kernel: Lustre: 101683:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 05:17:07 fir-md1-s1 kernel: Lustre: 101683:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 23 previous similar messages May 02 05:24:54 fir-md1-s1 kernel: Lustre: 102768:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 05:24:54 fir-md1-s1 kernel: Lustre: 102768:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 56 previous similar messages May 02 05:29:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 3fbaa703-3adf-9bb5-3d07-350b21402455 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984c2aea5400, cur 1556800148 expire 1556799998 last 1556799921 May 02 05:29:08 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 05:29:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 93524e03-763b-9556-15d0-9c57c97e51dd (at 10.8.21.21@o2ib6) May 02 05:29:35 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 05:44:37 fir-md1-s1 kernel: Lustre: 101902:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 May 02 05:44:37 fir-md1-s1 kernel: Lustre: 101902:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 57 previous similar messages May 02 05:53:09 fir-md1-s1 kernel: Lustre: 102698:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 05:53:09 fir-md1-s1 kernel: Lustre: 102698:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 296 previous similar messages May 02 05:56:50 fir-md1-s1 kernel: Lustre: 102549:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 05:56:50 fir-md1-s1 kernel: Lustre: 102549:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 24 previous similar messages May 02 06:01:47 fir-md1-s1 kernel: Lustre: 102422:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 06:01:47 fir-md1-s1 kernel: Lustre: 102422:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 25 previous similar messages May 02 06:24:07 fir-md1-s1 kernel: Lustre: 102473:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 06:24:07 fir-md1-s1 kernel: Lustre: 102473:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 33 previous similar messages May 02 06:25:18 fir-md1-s1 kernel: Lustre: 102545:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 06:25:18 fir-md1-s1 kernel: Lustre: 102545:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 38 previous similar messages May 02 06:27:38 fir-md1-s1 kernel: Lustre: 102708:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 06:27:38 fir-md1-s1 kernel: Lustre: 102708:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 68 previous similar messages May 02 06:35:56 fir-md1-s1 kernel: Lustre: 102672:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 06:35:56 fir-md1-s1 kernel: Lustre: 102672:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 73 previous similar messages May 02 06:45:02 fir-md1-s1 kernel: Lustre: 102481:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 06:45:02 fir-md1-s1 kernel: Lustre: 102481:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 36 previous similar messages May 02 07:23:29 fir-md1-s1 kernel: Lustre: 102456:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 07:23:29 fir-md1-s1 kernel: Lustre: 102456:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 45 previous similar messages May 02 07:32:38 fir-md1-s1 kernel: Lustre: 102672:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 07:32:38 fir-md1-s1 kernel: Lustre: 102672:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 17 previous similar messages May 02 07:35:16 fir-md1-s1 kernel: Lustre: 102527:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 07:35:16 fir-md1-s1 kernel: Lustre: 102527:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 67 previous similar messages May 02 07:46:20 fir-md1-s1 kernel: Lustre: 102504:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 08:00:17 fir-md1-s1 kernel: Lustre: 102398:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 08:00:17 fir-md1-s1 kernel: Lustre: 102398:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 111 previous similar messages May 02 08:12:25 fir-md1-s1 kernel: Lustre: 102498:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 08:12:25 fir-md1-s1 kernel: Lustre: 102498:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 32 previous similar messages May 02 08:24:49 fir-md1-s1 kernel: Lustre: 102478:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 08:24:49 fir-md1-s1 kernel: Lustre: 102478:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 57 previous similar messages May 02 08:38:55 fir-md1-s1 kernel: Lustre: 102501:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 08:38:55 fir-md1-s1 kernel: Lustre: 102501:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 75 previous similar messages May 02 08:51:41 fir-md1-s1 kernel: Lustre: 102722:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 08:51:41 fir-md1-s1 kernel: Lustre: 102722:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 45 previous similar messages May 02 09:31:21 fir-md1-s1 kernel: Lustre: 102739:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 May 02 09:31:21 fir-md1-s1 kernel: Lustre: 102739:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 16 previous similar messages May 02 09:34:37 fir-md1-s1 kernel: Lustre: 102451:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 09:41:13 fir-md1-s1 kernel: Lustre: 102502:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 09:41:13 fir-md1-s1 kernel: Lustre: 102502:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 14 previous similar messages May 02 09:51:16 fir-md1-s1 kernel: Lustre: 102546:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 09:51:16 fir-md1-s1 kernel: Lustre: 102546:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 28 previous similar messages May 02 10:03:23 fir-md1-s1 kernel: Lustre: 102546:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 10:03:23 fir-md1-s1 kernel: Lustre: 102546:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 123 previous similar messages May 02 10:13:43 fir-md1-s1 kernel: Lustre: 102562:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 10:13:43 fir-md1-s1 kernel: Lustre: 102562:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 50 previous similar messages May 02 10:25:22 fir-md1-s1 kernel: perf: interrupt took too long (3961 > 3912), lowering kernel.perf_event_max_sample_rate to 50000 May 02 10:33:06 fir-md1-s1 kernel: Lustre: 102567:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 10:33:06 fir-md1-s1 kernel: Lustre: 102567:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 53 previous similar messages May 02 10:43:52 fir-md1-s1 kernel: Lustre: 102501:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 10:43:52 fir-md1-s1 kernel: Lustre: 102501:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 57 previous similar messages May 02 10:53:52 fir-md1-s1 kernel: Lustre: 102634:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 10:53:52 fir-md1-s1 kernel: Lustre: 102634:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2617 previous similar messages May 02 11:03:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 44af0c56-5a88-a72b-6045-a7b009a95d81 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98496fb3c400, cur 1556820222 expire 1556820072 last 1556819995 May 02 11:03:42 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 11:11:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 02 11:11:59 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 11:23:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 4ddbaacf-8a30-33af-ce74-60975dfc2df4 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982ecf2e7800, cur 1556821401 expire 1556821251 last 1556821174 May 02 11:23:21 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 11:23:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 4ddbaacf-8a30-33af-ce74-60975dfc2df4 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9847c1a5a800, cur 1556821410 expire 1556821260 last 1556821183 May 02 11:23:30 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 02 11:24:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 02 11:24:37 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 11:33:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 172c526d-c0f5-9a2a-9f65-ccfb5a20ff9a (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982ec1e04400, cur 1556822033 expire 1556821883 last 1556821806 May 02 11:33:58 fir-md1-s1 kernel: Lustre: 102493:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 11:33:58 fir-md1-s1 kernel: Lustre: 102493:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2586 previous similar messages May 02 11:34:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 02 11:34:52 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 11:37:17 fir-md1-s1 kernel: Lustre: 102400:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 11:37:17 fir-md1-s1 kernel: Lustre: 102400:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 522 previous similar messages May 02 11:39:47 fir-md1-s1 kernel: Lustre: 102394:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 11:39:47 fir-md1-s1 kernel: Lustre: 102394:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1265 previous similar messages May 02 11:41:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 58f926d8-2802-f47d-7c08-c8978d4a4d11 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983642355000, cur 1556822472 expire 1556822322 last 1556822245 May 02 11:41:12 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 11:42:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 02 11:42:08 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 11:44:47 fir-md1-s1 kernel: Lustre: 101684:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 11:44:47 fir-md1-s1 kernel: Lustre: 101684:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 5176 previous similar messages May 02 11:50:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client c2b0b8a6-df91-5479-ef43-e8943f21239d (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983985bc7000, cur 1556823009 expire 1556822859 last 1556822782 May 02 11:50:09 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 11:50:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 02 11:50:57 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 11:54:47 fir-md1-s1 kernel: Lustre: 102754:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 11:54:47 fir-md1-s1 kernel: Lustre: 102754:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 12438 previous similar messages May 02 12:05:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 6aeb6079-402b-3bb1-a2d3-c82f7dedb64c (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982520b40c00, cur 1556823914 expire 1556823764 last 1556823687 May 02 12:05:14 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 12:06:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 02 12:06:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 12:11:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 9eff0b4e-3afc-f49d-130c-6037f4f08b84 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9820cfac2800, cur 1556824309 expire 1556824159 last 1556824082 May 02 12:11:49 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 12:11:50 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client d3d79854-ad1c-702f-3855-225ef3b11dfd (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985517788400, cur 1556824310 expire 1556824160 last 1556824083 May 02 12:11:50 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 02 12:12:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 02 12:12:22 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 12:17:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e1793f57-a4ba-6d03-0fea-3e33901bbb9a (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984c49284c00, cur 1556824648 expire 1556824498 last 1556824421 May 02 12:17:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client e1793f57-a4ba-6d03-0fea-3e33901bbb9a (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982f68e04400, cur 1556824655 expire 1556824505 last 1556824428 May 02 12:17:35 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 02 12:21:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 02 12:21:10 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 12:26:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 93ac24e1-823a-ff22-9381-5377a6e67cd8 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9843bea7d800, cur 1556825175 expire 1556825025 last 1556824948 May 02 12:28:50 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 02 12:28:50 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 12:34:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 47cd7d71-22c2-28c6-529e-0733b434823f (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9826de31ec00, cur 1556825686 expire 1556825536 last 1556825459 May 02 12:34:46 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 12:35:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 02 12:35:08 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 12:40:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client c9762523-e50e-5d2d-7f55-8a388512d616 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982e9b6e2c00, cur 1556826038 expire 1556825888 last 1556825811 May 02 12:40:38 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 12:44:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 02 12:44:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 12:47:36 fir-md1-s1 kernel: Lustre: 102744:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 May 02 12:47:36 fir-md1-s1 kernel: Lustre: 102744:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2244 previous similar messages May 02 12:56:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client bd662612-7c6d-b660-6da2-0730c6cccb0c (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983524e0f400, cur 1556826965 expire 1556826815 last 1556826738 May 02 12:56:05 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 12:59:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 02 12:59:42 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 13:04:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 45ccfd2e-c9a2-5159-cfd1-e5d0ed6a8547 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984bd661f000, cur 1556827487 expire 1556827337 last 1556827260 May 02 13:04:47 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 13:05:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 02 13:05:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 13:08:59 fir-md1-s1 kernel: Lustre: 102593:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556827732/real 1556827732] req@ffff982b0ae4d400 x1632261391650288/t0(0) o104->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556827739 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 02 13:09:06 fir-md1-s1 kernel: Lustre: 102593:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556827739/real 1556827739] req@ffff982b0ae4d400 x1632261391650288/t0(0) o104->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556827746 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 02 13:09:06 fir-md1-s1 kernel: Lustre: 102593:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 02 13:09:07 fir-md1-s1 kernel: Lustre: 102623:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982e9362f500 x1631740646263920/t0(0) o101->6e7b2a9e-9424-90c5-3654-c5764689cf5a@10.8.11.24@o2ib6:12/0 lens 576/3264 e 1 to 0 dl 1556827752 ref 2 fl Interpret:/0/0 rc 0/0 May 02 13:09:07 fir-md1-s1 kernel: Lustre: 102623:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1217 previous similar messages May 02 13:09:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6e7b2a9e-9424-90c5-3654-c5764689cf5a (at 10.8.11.24@o2ib6) reconnecting May 02 13:09:13 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages May 02 13:09:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.8.11.24@o2ib6) May 02 13:09:13 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 13:09:20 fir-md1-s1 kernel: Lustre: 102593:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556827753/real 1556827753] req@ffff982b0ae4d400 x1632261391650288/t0(0) o104->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556827760 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 02 13:09:20 fir-md1-s1 kernel: Lustre: 102593:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages May 02 13:09:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6e7b2a9e-9424-90c5-3654-c5764689cf5a (at 10.8.11.24@o2ib6) reconnecting May 02 13:09:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.8.11.24@o2ib6) May 02 13:09:41 fir-md1-s1 kernel: Lustre: 102593:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556827774/real 1556827774] req@ffff982b0ae4d400 x1632261391650288/t0(0) o104->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556827781 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 02 13:09:41 fir-md1-s1 kernel: Lustre: 102593:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 02 13:09:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6e7b2a9e-9424-90c5-3654-c5764689cf5a (at 10.8.11.24@o2ib6) reconnecting May 02 13:09:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.8.11.24@o2ib6) May 02 13:10:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 6f24849a-5082-4c56-5222-ba0806df8317 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983af7643000, cur 1556827809 expire 1556827659 last 1556827582 May 02 13:10:09 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 13:13:32 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 02 13:18:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 381cf436-f89c-8581-2b6b-48bc0bd0427a (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9841b6fd3000, cur 1556828324 expire 1556828174 last 1556828097 May 02 13:18:44 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 13:19:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 02 13:19:11 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 13:24:13 fir-md1-s1 kernel: Lustre: 102673:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556828646/real 1556828646] req@ffff984c65e15a00 x1632261621340896/t0(0) o104->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556828653 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 02 13:24:13 fir-md1-s1 kernel: Lustre: 102673:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 7 previous similar messages May 02 13:24:20 fir-md1-s1 kernel: Lustre: 102673:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556828653/real 1556828653] req@ffff984c65e15a00 x1632261621340896/t0(0) o104->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556828660 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 02 13:24:20 fir-md1-s1 kernel: Lustre: 102673:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 02 13:24:22 fir-md1-s1 kernel: Lustre: 102634:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9823972cda00 x1631694454225920/t0(0) o101->55f4b4b8-3c65-8c0f-db1c-2b3dc27b8988@10.8.30.17@o2ib6:27/0 lens 1776/3288 e 1 to 0 dl 1556828667 ref 2 fl Interpret:/0/0 rc 0/0 May 02 13:24:28 fir-md1-s1 kernel: Lustre: 101685:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556828661/real 1556828661] req@ffff982648b45d00 x1632261621701280/t0(0) o104->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556828668 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 02 13:24:28 fir-md1-s1 kernel: Lustre: 101685:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 02 13:24:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 55f4b4b8-3c65-8c0f-db1c-2b3dc27b8988 (at 10.8.30.17@o2ib6) reconnecting May 02 13:24:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.8.30.17@o2ib6) May 02 13:24:28 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 13:24:48 fir-md1-s1 kernel: Lustre: 102673:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556828681/real 1556828681] req@ffff984c65e15a00 x1632261621340896/t0(0) o104->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556828688 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 02 13:24:48 fir-md1-s1 kernel: Lustre: 102673:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages May 02 13:24:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 55f4b4b8-3c65-8c0f-db1c-2b3dc27b8988 (at 10.8.30.17@o2ib6) reconnecting May 02 13:25:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 55f4b4b8-3c65-8c0f-db1c-2b3dc27b8988 (at 10.8.30.17@o2ib6) reconnecting May 02 13:25:23 fir-md1-s1 kernel: Lustre: 102673:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556828716/real 1556828716] req@ffff984c65e15a00 x1632261621340896/t0(0) o104->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556828723 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 02 13:25:23 fir-md1-s1 kernel: Lustre: 102673:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages May 02 13:25:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 55f4b4b8-3c65-8c0f-db1c-2b3dc27b8988 (at 10.8.30.17@o2ib6) reconnecting May 02 13:26:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 55f4b4b8-3c65-8c0f-db1c-2b3dc27b8988 (at 10.8.30.17@o2ib6) reconnecting May 02 13:26:14 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 02 13:26:26 fir-md1-s1 kernel: LustreError: 102673:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.26.4@o2ib6) returned error from blocking AST (req@ffff984c65e15a00 x1632261621340896 status -107 rc -107), evict it ns: mdt-fir-MDT0002_UUID lock: ffff983583399200/0xce88538dbaa64123 lrc: 4/0,0 mode: PR/PR res: [0x2c001c07e:0x209e:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.8.26.4@o2ib6 remote: 0x64590add1d78a1a0 expref: 133 pid: 102693 timeout: 354207 lvb_type: 0 May 02 13:26:26 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.26.4@o2ib6 was evicted due to a lock blocking callback time out: rc -107 May 02 13:26:26 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 140s: evicting client at 10.8.26.4@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff983583399200/0xce88538dbaa64123 lrc: 3/0,0 mode: PR/PR res: [0x2c001c07e:0x209e:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.8.26.4@o2ib6 remote: 0x64590add1d78a1a0 expref: 134 pid: 102693 timeout: 0 lvb_type: 0 May 02 13:27:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client bcd1f643-86fc-1a53-19ae-6fe448ff66ff (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98291aad3400, cur 1556828832 expire 1556828682 last 1556828605 May 02 13:27:12 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 13:27:59 fir-md1-s1 kernel: Lustre: 102734:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556828872/real 1556828872] req@ffff9827bc79e900 x1632261677662784/t0(0) o104->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556828879 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 02 13:27:59 fir-md1-s1 kernel: Lustre: 102734:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 18 previous similar messages May 02 13:28:10 fir-md1-s1 kernel: Lustre: 101711:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (4/4), not sending early reply req@ffff983613375400 x1631769647024176/t0(0) o101->3825c1ac-ec37-9b30-c321-9c7f508c61a5@10.8.26.3@o2ib6:14/0 lens 576/3264 e 1 to 0 dl 1556828894 ref 2 fl Interpret:/0/0 rc 0/0 May 02 13:28:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 3825c1ac-ec37-9b30-c321-9c7f508c61a5 (at 10.8.26.3@o2ib6) reconnecting May 02 13:28:17 fir-md1-s1 kernel: Lustre: 102732:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff98332912d100 x1631683884345424/t0(0) o101->7da2364c-273e-9791-279a-dee1848c518b@10.8.25.6@o2ib6:22/0 lens 576/3264 e 0 to 0 dl 1556828902 ref 2 fl Interpret:/0/0 rc 0/0 May 02 13:28:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 30914fa9-f16d-1c3c-9f79-80d10d6d2efb (at 10.8.25.6@o2ib6) May 02 13:28:54 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages May 02 13:29:22 fir-md1-s1 kernel: LustreError: 101685:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556828872, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9822a2cb5e80/0xce88538dc206f162 lrc: 3/1,0 mode: --/PR res: [0x2c001bf0d:0x1d497:0x0].0x0 bits 0x13/0x0 rrc: 21 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 101685 timeout: 0 lvb_type: 0 May 02 13:29:22 fir-md1-s1 kernel: LustreError: 101685:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 2 previous similar messages May 02 13:30:12 fir-md1-s1 kernel: Lustre: 102734:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556829005/real 1556829005] req@ffff9827bc79e900 x1632261677662784/t0(0) o104->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556829012 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 02 13:30:12 fir-md1-s1 kernel: Lustre: 102734:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 37 previous similar messages May 02 13:30:27 fir-md1-s1 kernel: LustreError: 102734:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.26.4@o2ib6) failed to reply to blocking AST (req@ffff9827bc79e900 x1632261677662784 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff983629f7b600/0xce88538dc08e7001 lrc: 4/0,0 mode: PR/PR res: [0x2c001bf0d:0x1d497:0x0].0x0 bits 0x13/0x0 rrc: 24 type: IBT flags: 0x60200400000020 nid: 10.8.26.4@o2ib6 remote: 0xf6c54c7e87827c63 expref: 132 pid: 102736 timeout: 354440 lvb_type: 0 May 02 13:30:27 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.26.4@o2ib6 was evicted due to a lock blocking callback time out: rc -110 May 02 13:30:27 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 155s: evicting client at 10.8.26.4@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff983629f7b600/0xce88538dc08e7001 lrc: 3/0,0 mode: PR/PR res: [0x2c001bf0d:0x1d497:0x0].0x0 bits 0x13/0x0 rrc: 25 type: IBT flags: 0x60200400000020 nid: 10.8.26.4@o2ib6 remote: 0xf6c54c7e87827c63 expref: 133 pid: 102736 timeout: 0 lvb_type: 0 May 02 13:31:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 890e4d16-d7c3-3317-6d52-b0c9fa5bef3f (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982f066f2c00, cur 1556829062 expire 1556828912 last 1556828835 May 02 13:31:02 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 02 13:34:19 fir-md1-s1 kernel: Lustre: 102716:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff981d07e8b900 x1631677350402736/t0(0) o101->5386ec54-15c8-ff0c-b06f-9f106d71db67@10.8.24.11@o2ib6:23/0 lens 1784/3288 e 0 to 0 dl 1556829263 ref 2 fl Interpret:/0/0 rc 0/0 May 02 13:34:19 fir-md1-s1 kernel: Lustre: 102716:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages May 02 13:34:23 fir-md1-s1 kernel: Lustre: 102448:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff9848aaaba100 x1631313698918160/t0(0) o101->ddef0525-fd05-baf0-eec8-55af7a82431b@10.8.24.4@o2ib6:28/0 lens 576/3264 e 0 to 0 dl 1556829268 ref 2 fl Interpret:/0/0 rc 0/0 May 02 13:34:23 fir-md1-s1 kernel: Lustre: 102448:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 02 13:34:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 5386ec54-15c8-ff0c-b06f-9f106d71db67 (at 10.8.24.11@o2ib6) reconnecting May 02 13:34:25 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages May 02 13:34:55 fir-md1-s1 kernel: Lustre: 101912:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556829264/real 1556829264] req@ffff982670ac0c00 x1632261772249008/t0(0) o104->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556829295 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 02 13:34:55 fir-md1-s1 kernel: Lustre: 101912:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 7 previous similar messages May 02 13:35:24 fir-md1-s1 kernel: LustreError: 102672:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556829234, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff98320027a880/0xce88538dca598966 lrc: 3/1,0 mode: --/PR res: [0x2c001bf0d:0x1d49d:0x0].0x0 bits 0x13/0x0 rrc: 23 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 102672 timeout: 0 lvb_type: 0 May 02 13:35:24 fir-md1-s1 kernel: LustreError: 102672:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 2 previous similar messages May 02 13:36:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a8b9138a-75f6-8796-bab3-20790c25867e (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984c9f03c400, cur 1556829373 expire 1556829223 last 1556829146 May 02 13:36:13 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 02 13:40:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 02 13:40:54 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages May 02 13:46:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 72fe2597-0f48-cad2-5e45-8d7f741709bb (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff981f7a9bc000, cur 1556830010 expire 1556829860 last 1556829783 May 02 13:46:50 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 02 13:53:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 02 13:53:05 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 02 13:57:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 7ee2e394-da6f-afb6-ffd2-05fe6382c964 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982777f37800, cur 1556830624 expire 1556830474 last 1556830397 May 02 13:57:04 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 02 14:11:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client c4e75ec6-3deb-55b0-3f43-268cf1ca9b51 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982e33201400, cur 1556831502 expire 1556831352 last 1556831275 May 02 14:11:42 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 14:12:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 02 14:12:24 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 02 15:04:26 fir-md1-s1 kernel: Lustre: 102593:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 May 02 15:04:26 fir-md1-s1 kernel: Lustre: 102593:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 101 previous similar messages May 02 15:06:59 fir-md1-s1 kernel: Lustre: 102648:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556834811/real 1556834811] req@ffff983ce8796600 x1632263204964752/t0(0) o104->fir-MDT0000@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556834818 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 02 15:06:59 fir-md1-s1 kernel: Lustre: 102648:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 02 15:07:06 fir-md1-s1 kernel: Lustre: 102566:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff98359138d400 x1631774131023440/t0(0) o101->2c24b658-0f5e-02f4-4639-616535a9fa56@10.8.20.3@o2ib6:11/0 lens 1808/3288 e 1 to 0 dl 1556834831 ref 2 fl Interpret:/0/0 rc 0/0 May 02 15:07:07 fir-md1-s1 kernel: Lustre: 102759:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9822eefa5100 x1631750720415472/t0(0) o101->ba47e349-0702-f6f0-1080-5cd761e580b9@10.8.24.25@o2ib6:12/0 lens 576/3264 e 1 to 0 dl 1556834832 ref 2 fl Interpret:/0/0 rc 0/0 May 02 15:07:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 2c24b658-0f5e-02f4-4639-616535a9fa56 (at 10.8.20.3@o2ib6) reconnecting May 02 15:07:13 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages May 02 15:07:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 2a3d61b1-fd25-9a55-f10a-5d6b6c2fa148 (at 10.8.20.3@o2ib6) May 02 15:07:13 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 02 15:07:18 fir-md1-s1 kernel: Lustre: 102685:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff98414c6b7b00 x1631610420828096/t0(0) o101->c55a1e29-63e2-1cd7-0e7e-0d12f0df0a6c@10.8.30.1@o2ib6:23/0 lens 576/3264 e 0 to 0 dl 1556834843 ref 2 fl Interpret:/0/0 rc 0/0 May 02 15:07:18 fir-md1-s1 kernel: Lustre: 102685:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages May 02 15:07:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 2c24b658-0f5e-02f4-4639-616535a9fa56 (at 10.8.20.3@o2ib6) reconnecting May 02 15:07:55 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages May 02 15:08:09 fir-md1-s1 kernel: Lustre: 102648:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556834882/real 1556834882] req@ffff983ce8796600 x1632263204964752/t0(0) o104->fir-MDT0000@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556834889 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 02 15:08:09 fir-md1-s1 kernel: Lustre: 102648:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages May 02 15:08:22 fir-md1-s1 kernel: LustreError: 102441:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556834812, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff98354f2a7740/0xce88538e49bf7c05 lrc: 3/1,0 mode: --/PR res: [0x200003fd9:0x12927:0x0].0x0 bits 0x13/0x0 rrc: 24 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 102441 timeout: 0 lvb_type: 0 May 02 15:08:22 fir-md1-s1 kernel: LustreError: 102441:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 2 previous similar messages May 02 15:08:23 fir-md1-s1 kernel: LustreError: 102713:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556834813, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff98221da05a00/0xce88538e49c661fd lrc: 3/1,0 mode: --/PR res: [0x200003fd9:0x12927:0x0].0x0 bits 0x13/0x0 rrc: 24 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 102713 timeout: 0 lvb_type: 0 May 02 15:08:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 2a3d61b1-fd25-9a55-f10a-5d6b6c2fa148 (at 10.8.20.3@o2ib6) May 02 15:08:37 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages May 02 15:08:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 35803f72-8a48-caae-4a4d-683dc7c32bbb (at 10.8.20.27@o2ib6) reconnecting May 02 15:08:59 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages May 02 15:09:26 fir-md1-s1 kernel: LustreError: 102648:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.26.4@o2ib6) failed to reply to blocking AST (req@ffff983ce8796600 x1632263204964752 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff984a23244c80/0xce88538e4837ea76 lrc: 4/0,0 mode: PR/PR res: [0x200003fd9:0x12927:0x0].0x0 bits 0x13/0x0 rrc: 25 type: IBT flags: 0x60200400000020 nid: 10.8.26.4@o2ib6 remote: 0xb4fc36cf1dd32fec expref: 39 pid: 102352 timeout: 360379 lvb_type: 0 May 02 15:09:26 fir-md1-s1 kernel: LustreError: 102648:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) Skipped 1 previous similar message May 02 15:09:26 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.8.26.4@o2ib6 was evicted due to a lock blocking callback time out: rc -110 May 02 15:09:26 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message May 02 15:09:26 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 155s: evicting client at 10.8.26.4@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff984a23244c80/0xce88538e4837ea76 lrc: 3/0,0 mode: PR/PR res: [0x200003fd9:0x12927:0x0].0x0 bits 0x13/0x0 rrc: 25 type: IBT flags: 0x60200400000020 nid: 10.8.26.4@o2ib6 remote: 0xb4fc36cf1dd32fec expref: 40 pid: 102352 timeout: 0 lvb_type: 0 May 02 15:09:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 224573b7-e555-fcb4-9196-684f2aee08d5 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff981cd3c88400, cur 1556834989 expire 1556834839 last 1556834762 May 02 15:09:49 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 02 15:48:04 fir-md1-s1 kernel: Lustre: 102628:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 15:48:04 fir-md1-s1 kernel: Lustre: 102628:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message May 02 15:48:05 fir-md1-s1 kernel: Lustre: 102646:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 15:48:05 fir-md1-s1 kernel: Lustre: 102646:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 473 previous similar messages May 02 15:48:08 fir-md1-s1 kernel: Lustre: 101905:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 15:48:08 fir-md1-s1 kernel: Lustre: 101905:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 497 previous similar messages May 02 15:48:13 fir-md1-s1 kernel: Lustre: 102435:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 15:48:13 fir-md1-s1 kernel: Lustre: 102435:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1892 previous similar messages May 02 15:48:22 fir-md1-s1 kernel: Lustre: 102376:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 15:48:22 fir-md1-s1 kernel: Lustre: 102376:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3472 previous similar messages May 02 15:48:41 fir-md1-s1 kernel: Lustre: 102493:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 02 15:48:41 fir-md1-s1 kernel: Lustre: 102493:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 7648 previous similar messages May 02 16:28:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 02 16:28:55 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages May 02 16:29:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 111d3c6e-044c-2474-7739-f5cf178c4f50 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98466826d800, cur 1556839755 expire 1556839605 last 1556839528 May 02 16:29:15 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 02 16:47:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 5ea8819a-828a-65cf-f59c-c1f9bbaca44f (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984c343be000, cur 1556840865 expire 1556840715 last 1556840638 May 02 16:47:45 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 16:47:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 5ea8819a-828a-65cf-f59c-c1f9bbaca44f (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98329c1b2000, cur 1556840874 expire 1556840724 last 1556840647 May 02 16:47:54 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 02 16:48:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 02 16:48:22 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 19:20:16 fir-md1-s1 kernel: Lustre: 102491:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9840f4ed3f00 x1631900091795200/t0(0) o101->9a8bc7f0-674a-721d-c255-50108001b9f0@10.8.0.66@o2ib6:21/0 lens 480/568 e 1 to 0 dl 1556850021 ref 2 fl Interpret:/0/0 rc 0/0 May 02 19:20:16 fir-md1-s1 kernel: Lustre: 102491:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 02 19:28:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 8b37eb62-720c-7043-4839-bef9879e22f7 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984c5bfc5000, cur 1556850490 expire 1556850340 last 1556850263 May 02 19:28:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 02 19:28:54 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 19:41:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client fd434721-6365-6185-32ff-00f1f3487de4 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984189f00000, cur 1556851310 expire 1556851160 last 1556851083 May 02 19:41:50 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 19:41:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client fd434721-6365-6185-32ff-00f1f3487de4 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff981cd7d8e000, cur 1556851316 expire 1556851166 last 1556851089 May 02 19:42:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 02 19:42:38 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 19:58:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client ee0a0f5b-1a45-2dc9-8ad3-be9445926234 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9824a1e3c400, cur 1556852316 expire 1556852166 last 1556852089 May 02 19:58:36 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 02 19:58:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 02 19:58:57 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 20:10:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 5ade0ad4-3544-d046-38f3-e328f1c49cb6 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984163722c00, cur 1556853018 expire 1556852868 last 1556852791 May 02 20:10:18 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 20:11:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 02 20:11:08 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 20:13:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 59acbc1a-7ffb-ea0d-d547-724bb9e2d549 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98320db78400, cur 1556853238 expire 1556853088 last 1556853011 May 02 20:13:58 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 20:14:16 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 02 20:14:16 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 20:18:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 2db61525-0572-20c7-4bec-8eacb2081288 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98495d340000, cur 1556853511 expire 1556853361 last 1556853284 May 02 20:18:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 20:18:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 02 20:18:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 20:22:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client eb37cfba-87f3-87c0-46ed-5e5b8889c7b4 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983c106f6800, cur 1556853749 expire 1556853599 last 1556853522 May 02 20:22:29 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 20:23:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 02 20:23:09 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 20:26:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 0840f217-cd93-17cc-422e-1eb9c197e388 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983cc3f81400, cur 1556853976 expire 1556853826 last 1556853749 May 02 20:26:16 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 20:30:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 02 20:30:35 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 20:34:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 066d7298-dc05-7eb2-47bb-6555ef5cc631 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984890f59400, cur 1556854446 expire 1556854296 last 1556854219 May 02 20:34:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 20:34:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 02 20:34:44 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 20:41:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 276f680b-b322-5691-4bcf-a98e1795221f (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983a73673c00, cur 1556854887 expire 1556854737 last 1556854660 May 02 20:41:27 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 20:45:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 02 20:45:27 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 20:45:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.10.29@o2ib6) May 02 20:45:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 20:54:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 8c2e7b34-1340-8b6e-9bb9-f4ee3d090d2e (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983cce51c000, cur 1556855683 expire 1556855533 last 1556855456 May 02 20:54:43 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 02 20:55:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 02 20:55:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 21:00:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 0e969bb3-055a-5246-ded1-15bd8520d2e1 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98295b08d400, cur 1556856017 expire 1556855867 last 1556855790 May 02 21:00:17 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 21:00:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 0e969bb3-055a-5246-ded1-15bd8520d2e1 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982e2de5b000, cur 1556856033 expire 1556855883 last 1556855806 May 02 21:00:33 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages May 02 21:00:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 02 21:00:45 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 21:01:53 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 02 21:01:53 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 21:28:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a886a0d6-d9f2-9bc7-ea16-457c939e6e92 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985ca3798400, cur 1556857721 expire 1556857571 last 1556857494 May 02 21:28:41 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 02 21:29:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 02 21:29:18 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 21:30:58 fir-md1-s1 kernel: Lustre: 102641:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff984cac50bc00 x1631545207753808/t0(0) o101->2e220de5-7b5b-3874-45ea-64c959a50d0b@10.8.0.67@o2ib6:3/0 lens 1768/3288 e 1 to 0 dl 1556857863 ref 2 fl Interpret:/0/0 rc 0/0 May 02 21:31:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 2e220de5-7b5b-3874-45ea-64c959a50d0b (at 10.8.0.67@o2ib6) reconnecting May 02 21:31:04 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages May 02 21:31:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.8.0.67@o2ib6) May 02 21:31:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 21:31:13 fir-md1-s1 kernel: LustreError: 102500:0:(upcall_cache.c:233:upcall_cache_get_entry()) acquire for key 318234: error -110 May 02 21:31:18 fir-md1-s1 kernel: Lustre: 102531:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff983a09f5a400 x1631547745615744/t0(0) o101->0e7d6cbd-2dc2-8104-92fb-8187f3b6e75a@10.8.8.11@o2ib6:23/0 lens 576/3264 e 1 to 0 dl 1556857883 ref 2 fl Interpret:/0/0 rc 0/0 May 02 21:35:05 fir-md1-s1 kernel: Lustre: 102385:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556858098/real 1556858098] req@ffff981d25a2c800 x1632269117903680/t0(0) o106->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556858105 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 02 21:35:05 fir-md1-s1 kernel: Lustre: 102385:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 11 previous similar messages May 02 21:35:13 fir-md1-s1 kernel: Lustre: 102570:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982b73a70300 x1632088234556096/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:18/0 lens 480/568 e 1 to 0 dl 1556858118 ref 2 fl Interpret:/0/0 rc 0/0 May 02 21:35:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 02 21:35:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.0.10.3@o2ib7) May 02 21:35:26 fir-md1-s1 kernel: Lustre: 102385:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556858119/real 1556858119] req@ffff981d25a2c800 x1632269117903680/t0(0) o106->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556858126 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 02 21:35:26 fir-md1-s1 kernel: Lustre: 102385:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 02 21:35:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 7ad6038c-e130-9e19-0552-3b91313cb0c0 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98235bc0b000, cur 1556858138 expire 1556857988 last 1556857911 May 02 21:35:38 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 21:35:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.0.10.3@o2ib7) May 02 21:35:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 7ad6038c-e130-9e19-0552-3b91313cb0c0 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983724a79800, cur 1556858157 expire 1556858007 last 1556857930 May 02 21:35:57 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 02 21:36:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 02 21:46:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 89fe1949-4d94-3b7b-9566-994a2e556f6a (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982ecc666000, cur 1556858815 expire 1556858665 last 1556858588 May 02 21:47:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 02 21:47:28 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 21:57:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 02 21:57:00 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 21:57:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client ca189637-f46d-b577-70ce-b9de98e9c123 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9840c2fd6800, cur 1556859479 expire 1556859329 last 1556859252 May 02 21:57:59 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 22:09:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 1bf1e1e9-c78f-dcda-1c43-3ea5bf95d5b3 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98313b742c00, cur 1556860177 expire 1556860027 last 1556859950 May 02 22:09:37 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 22:10:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 02 22:10:08 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 22:15:04 fir-md1-s1 kernel: Lustre: 102726:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9848a5788c00 x1631643231937968/t0(0) o101->e0e6d63f-0238-284f-ef41-faf2bb976ece@10.9.108.52@o2ib4:9/0 lens 1768/3288 e 1 to 0 dl 1556860509 ref 2 fl Interpret:/0/0 rc 0/0 May 02 22:15:06 fir-md1-s1 kernel: Lustre: 102623:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff98398325b300 x1631676472609120/t0(0) o101->27ca7937-8a40-795a-1879-048dcb621b68@10.9.108.46@o2ib4:11/0 lens 1768/3288 e 1 to 0 dl 1556860511 ref 2 fl Interpret:/0/0 rc 0/0 May 02 22:15:06 fir-md1-s1 kernel: Lustre: 102623:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages May 02 22:18:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e4e678a0-7b8a-1717-2775-ddbcc8fc22df (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9823bff55c00, cur 1556860739 expire 1556860589 last 1556860512 May 02 22:18:59 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 22:20:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 02 22:20:02 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 22:20:05 fir-md1-s1 kernel: Lustre: 102564:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 May 02 22:20:05 fir-md1-s1 kernel: Lustre: 102564:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3084 previous similar messages May 02 22:43:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 0b5f092b-97a6-9972-cdc1-5734afdc9cdd (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9846eefc4400, cur 1556862186 expire 1556862036 last 1556861959 May 02 22:43:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 22:43:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 02 22:43:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 22:46:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 7884db4b-2231-5156-0f6e-711bf297b217 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984bb6311400, cur 1556862396 expire 1556862246 last 1556862169 May 02 22:46:36 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 22:47:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 02 22:51:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client cfd376f5-99eb-64b3-852c-7ec755f9647e (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983b03a63800, cur 1556862678 expire 1556862528 last 1556862451 May 02 22:51:18 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages May 02 22:51:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 02 22:51:29 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 22:56:50 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 02 22:56:50 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 22:59:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f2785026-ba97-0a64-f4a9-63b058631860 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983835be8000, cur 1556863194 expire 1556863044 last 1556862967 May 02 22:59:54 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 23:06:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 527c5c76-e36c-6212-b0c9-7fb694ea6bf9 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98235050fc00, cur 1556863566 expire 1556863416 last 1556863339 May 02 23:06:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 23:07:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 02 23:07:01 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 02 23:10:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 746bfd56-d0d2-3dc8-6c36-864fa752d244 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9840b37c7800, cur 1556863844 expire 1556863694 last 1556863617 May 02 23:10:44 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 23:15:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f46af3cd-c06c-a547-e772-8a943f29af08 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983209e42800, cur 1556864127 expire 1556863977 last 1556863900 May 02 23:15:27 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 23:20:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 36049bdb-9701-cbf2-c7a5-67dcc86922f0 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98326ee79000, cur 1556864414 expire 1556864264 last 1556864187 May 02 23:20:14 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 23:24:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 02 23:24:09 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages May 02 23:27:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e2c3d463-602c-f780-2a86-11143700f970 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984882ea9800, cur 1556864858 expire 1556864708 last 1556864631 May 02 23:27:38 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 23:33:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 27614c77-11fd-6dcf-e26a-cffa7b35755b (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982885705800, cur 1556865205 expire 1556865055 last 1556864978 May 02 23:33:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 02 23:34:49 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 02 23:34:49 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages May 02 23:44:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 02 23:44:50 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages May 02 23:44:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 97b571d3-7066-7518-9f4d-1fc69dc6a7d5 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983c31e10000, cur 1556865893 expire 1556865743 last 1556865666 May 02 23:44:53 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages May 02 23:55:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.10.9@o2ib6) May 02 23:55:40 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages May 02 23:58:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 1230389d-aeba-96dd-fca1-60baa2e7677a (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982ba02e7400, cur 1556866700 expire 1556866550 last 1556866473 May 02 23:58:20 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages May 03 00:08:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 03 00:08:34 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages May 03 00:12:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 4df8c6b1-51dc-6010-4fe0-9d1cd6584ae6 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9831e469c800, cur 1556867577 expire 1556867427 last 1556867350 May 03 00:12:57 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages May 03 00:23:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client fb911caf-e007-d19d-0ab4-1d2f1342b991 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982aec4f3800, cur 1556868188 expire 1556868038 last 1556867961 May 03 00:23:08 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 03 00:23:39 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 03 00:23:39 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages May 03 00:33:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 0ea3dd46-e104-3244-2691-cb70c35dd1c4 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98388c321000, cur 1556868838 expire 1556868688 last 1556868611 May 03 00:33:58 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 03 00:37:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 03 00:37:15 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 03 00:46:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 70acfceb-171a-b002-d478-76bd6c68048d (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9840a0259c00, cur 1556869567 expire 1556869417 last 1556869340 May 03 00:46:07 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages May 03 00:48:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 03 00:48:46 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages May 03 00:58:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f57ce904-ce0a-04b9-b70d-bd94c750bb0c (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984741b3d400, cur 1556870315 expire 1556870165 last 1556870088 May 03 00:58:35 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages May 03 00:59:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 03 00:59:04 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 03 01:07:51 fir-md1-s1 kernel: Lustre: 102363:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556870864/real 1556870864] req@ffff98298ceb0600 x1632272303606608/t0(0) o106->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556870871 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 03 01:07:51 fir-md1-s1 kernel: Lustre: 102363:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages May 03 01:07:58 fir-md1-s1 kernel: Lustre: 102363:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556870871/real 1556870871] req@ffff98298ceb0600 x1632272303606608/t0(0) o106->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556870878 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 03 01:07:59 fir-md1-s1 kernel: Lustre: 102707:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982bc1a2ad00 x1632090256290720/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:4/0 lens 480/568 e 1 to 0 dl 1556870884 ref 2 fl Interpret:/0/0 rc 0/0 May 03 01:07:59 fir-md1-s1 kernel: Lustre: 102707:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 03 01:08:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 03 01:08:09 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 03 01:08:12 fir-md1-s1 kernel: Lustre: 102363:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556870885/real 1556870885] req@ffff98298ceb0600 x1632272303606608/t0(0) o106->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556870892 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 03 01:08:12 fir-md1-s1 kernel: Lustre: 102363:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 03 01:08:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 03 01:08:33 fir-md1-s1 kernel: Lustre: 102363:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556870906/real 1556870906] req@ffff98298ceb0600 x1632272303606608/t0(0) o106->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556870913 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 03 01:08:33 fir-md1-s1 kernel: Lustre: 102363:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 03 01:08:58 fir-md1-s1 kernel: Lustre: 102701:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff985a50a5aa00 x1631584770885728/t0(0) o101->68f316fd-deaf-e6e6-f269-129b8f6e36e4@10.9.107.71@o2ib4:3/0 lens 480/568 e 1 to 0 dl 1556870943 ref 2 fl Interpret:/0/0 rc 0/0 May 03 01:08:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 03 01:09:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 48f3877f-4797-5154-b20e-abe04e19fe79 (at 10.9.107.71@o2ib4) May 03 01:09:04 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages May 03 01:09:09 fir-md1-s1 kernel: Lustre: 102363:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556870941/real 1556870941] req@ffff98298ceb0600 x1632272303606608/t0(0) o106->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556870948 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 03 01:09:09 fir-md1-s1 kernel: Lustre: 102363:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 7 previous similar messages May 03 01:09:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 03 01:09:21 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 03 01:10:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 03 01:10:05 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages May 03 01:10:14 fir-md1-s1 kernel: Lustre: 101692:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556871007/real 1556871007] req@ffff985495fa4500 x1632272318980688/t0(0) o106->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556871014 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 03 01:10:14 fir-md1-s1 kernel: Lustre: 101692:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 18 previous similar messages May 03 01:10:33 fir-md1-s1 kernel: Lustre: 102458:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff983b82746900 x1631743445858928/t0(0) o101->6a59f5f4-9fe2-de7b-dbb3-3fa96ee5f795@10.9.108.15@o2ib4:8/0 lens 480/568 e 0 to 0 dl 1556871038 ref 2 fl Interpret:/0/0 rc 0/0 May 03 01:10:39 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client e8f9820d-a898-b195-411c-3d8a11c8ac92 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985c57626800, cur 1556871039 expire 1556870889 last 1556870812 May 03 01:10:39 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 03 01:10:55 fir-md1-s1 kernel: Lustre: 102363:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (185:6s); client may timeout. req@ffff982bc1a2ad00 x1632090256290720/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:4/0 lens 480/536 e 1 to 0 dl 1556871049 ref 1 fl Complete:/0/0 rc 301/301 May 03 01:21:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 35f63ed0-5e1b-a5dd-b9a5-15c8f8126be3 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984bdbf1d800, cur 1556871706 expire 1556871556 last 1556871479 May 03 01:21:46 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 03 01:22:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 03 01:22:40 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages May 03 01:37:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 71b07175-2e63-3b03-c72d-44a8ae90a894 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983587313800, cur 1556872629 expire 1556872479 last 1556872402 May 03 01:37:09 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 03 01:37:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 03 01:37:48 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 03 01:50:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e67abb51-aaa3-19be-3dbc-4caacc64e59d (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983767678c00, cur 1556873452 expire 1556873302 last 1556873225 May 03 01:50:52 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 03 01:54:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 03 01:54:22 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 03 02:03:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 502638b4-fee2-1b8c-cf39-c17e5e932383 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9823a5d0fc00, cur 1556874193 expire 1556874043 last 1556873966 May 03 02:03:13 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 02:27:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 60f2ea10-0e78-5603-438a-5e9b49e30074 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9821c5e59000, cur 1556875626 expire 1556875476 last 1556875399 May 03 02:27:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 02:27:53 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 03 02:27:53 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 03 03:10:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e37145dc-5f1e-c4eb-43eb-3b341d7e0005 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9859ac6a6000, cur 1556878212 expire 1556878062 last 1556877985 May 03 03:10:12 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 03:10:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client e37145dc-5f1e-c4eb-43eb-3b341d7e0005 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985c94683c00, cur 1556878223 expire 1556878073 last 1556877996 May 03 03:10:23 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 03 03:11:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 03 03:11:08 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 03:20:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 03 03:20:08 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 03:20:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 4140f03b-1dfc-f0a7-c960-8945edd752ff (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983cc15d3800, cur 1556878859 expire 1556878709 last 1556878632 May 03 03:21:09 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 6b2174c2-8d14-c69d-a2cf-b70d124a6b81 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985cc0bcfc00, cur 1556878869 expire 1556878719 last 1556878642 May 03 03:21:09 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 03 03:22:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 03 03:22:57 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 03:25:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 4696c7f9-bd91-5bb6-81ad-8f0f8ff2f6cd (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9841a2a6f800, cur 1556879114 expire 1556878964 last 1556878887 May 03 03:25:14 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages May 03 03:25:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 03 03:25:17 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 03:28:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 03 03:28:02 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 03:29:04 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 5e5ed2bb-d3bb-2b7b-665a-45caa2858930 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9840b7644c00, cur 1556879344 expire 1556879194 last 1556879117 May 03 03:29:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 03:30:20 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client b05ee27f-ef68-cbec-a0f1-cb67247ab2b1 (at 10.8.26.4@o2ib6) in 214 seconds. I think it's dead, and I am evicting it. exp ffff985c1b7d4c00, cur 1556879420 expire 1556879270 last 1556879206 May 03 03:30:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 03:31:50 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 03 03:31:50 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 03:36:42 fir-md1-s1 kernel: Lustre: 102597:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 May 03 03:36:42 fir-md1-s1 kernel: Lustre: 102597:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message May 03 03:38:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 72d6f3a7-cca7-363a-96a9-df666766080a (at 10.8.26.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984c77f12c00, cur 1556879915 expire 1556879765 last 1556879688 May 03 03:38:35 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 03:39:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 627e76d3-c7eb-8d97-a64e-cb771e022cc0 (at 10.8.26.33@o2ib6) May 03 03:39:23 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 04:23:37 fir-md1-s1 kernel: Lustre: 102568:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 May 03 04:55:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 03a03f15-3bab-e9d9-9faa-41218638973c (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9823e4951400, cur 1556884527 expire 1556884377 last 1556884300 May 03 04:55:27 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 04:56:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 03 04:56:01 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 05:04:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 51109e2c-5a1c-a049-2254-66d0dd5d889c (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984bf6e5a800, cur 1556885067 expire 1556884917 last 1556884840 May 03 05:04:27 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 05:04:41 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 03 05:04:41 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 05:13:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 73d46309-aa6d-058e-b935-954ed7c04d8b (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9835816a8000, cur 1556885637 expire 1556885487 last 1556885410 May 03 05:13:57 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 05:15:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 03 05:15:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 05:20:23 fir-md1-s1 kernel: Lustre: 102646:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 May 03 05:24:56 fir-md1-s1 kernel: Lustre: 102731:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 May 03 05:24:56 fir-md1-s1 kernel: Lustre: 102731:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2 previous similar messages May 03 05:26:25 fir-md1-s1 kernel: Lustre: 102646:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 May 03 05:26:25 fir-md1-s1 kernel: Lustre: 102646:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 125 previous similar messages May 03 05:29:52 fir-md1-s1 kernel: Lustre: 101913:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 May 03 05:29:52 fir-md1-s1 kernel: Lustre: 101913:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 8 previous similar messages May 03 06:32:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 795cda94-55c2-2172-3b32-4d79ce01f5f4 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9848f3e95c00, cur 1556890377 expire 1556890227 last 1556890150 May 03 06:32:57 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 06:33:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 03 06:33:43 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 06:36:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e3844d58-7b07-a82a-d04e-4fefd5bd1f79 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984a28fae800, cur 1556890595 expire 1556890445 last 1556890368 May 03 06:36:35 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 06:37:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 03 06:37:19 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 06:40:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 86080157-4cef-640a-5df7-03ad2489370b (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff981e617d9800, cur 1556890804 expire 1556890654 last 1556890577 May 03 06:40:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 06:40:53 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 03 06:40:53 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 06:48:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 0b56fcd1-9fe0-663c-f221-1624bfee89ef (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98390c3c9400, cur 1556891295 expire 1556891145 last 1556891068 May 03 06:48:15 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 06:48:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 03 06:48:28 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 06:49:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 42457870-9868-0b26-4344-272e3da2dd49 (at 10.8.26.4@o2ib6) in 214 seconds. I think it's dead, and I am evicting it. exp ffff983ce0bb7800, cur 1556891371 expire 1556891221 last 1556891157 May 03 06:49:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 06:55:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e37e0e64-2f2e-0f99-9039-20009038f1f8 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983c9651f400, cur 1556891714 expire 1556891564 last 1556891487 May 03 06:55:14 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 06:57:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client c759cd7f-a22b-cf06-3a32-a610320b3d8a (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98313ceafc00, cur 1556891868 expire 1556891718 last 1556891641 May 03 06:57:48 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 06:58:32 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 03 06:58:32 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 03 07:04:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 979375a7-f3b9-fdf4-cb4e-0b72f19a98b4 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983ccd4be000, cur 1556892294 expire 1556892144 last 1556892067 May 03 07:04:54 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 07:16:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 4d72d8bf-7870-a7b1-6dd1-4412f3232f67 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9836037be000, cur 1556892987 expire 1556892837 last 1556892760 May 03 07:16:27 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 03 07:23:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 03 07:23:03 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages May 03 07:32:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 7d8e42fe-bcd7-07b8-347e-0321e1cca86b (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9822599df400, cur 1556893939 expire 1556893789 last 1556893712 May 03 07:32:19 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 07:32:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 03 07:32:43 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 07:42:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 4127c4d1-c98e-c193-0d1c-f3cd378b503a (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983139300800, cur 1556894544 expire 1556894394 last 1556894317 May 03 07:42:24 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 07:43:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 03 07:43:09 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 08:11:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 5a7835e8-621a-db61-297e-ea7cbe12c9c1 (at 10.8.26.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9833b4277c00, cur 1556896300 expire 1556896150 last 1556896073 May 03 08:11:40 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 03 08:12:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 627e76d3-c7eb-8d97-a64e-cb771e022cc0 (at 10.8.26.33@o2ib6) May 03 08:12:28 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 03 08:26:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client ef054150-dfc7-5fe1-a382-ea1061512073 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9832c7e98400, cur 1556897180 expire 1556897030 last 1556896953 May 03 08:26:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 08:32:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.10.29@o2ib6) May 03 08:32:24 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 08:38:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f95662b0-697c-b601-6f35-28b2dbb10221 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98357a673800, cur 1556897927 expire 1556897777 last 1556897700 May 03 08:38:47 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 08:39:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 03 08:39:05 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 08:40:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client c2b0e589-3588-d905-f10c-bac73af67084 (at 10.8.26.33@o2ib6) in 171 seconds. I think it's dead, and I am evicting it. exp ffff984c4d76ec00, cur 1556898003 expire 1556897853 last 1556897832 May 03 08:40:03 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 08:40:59 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 1a3d558b-63af-5c7e-c0b9-f7f8b83c05ef (at 10.8.26.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985c44f72800, cur 1556898059 expire 1556897909 last 1556897832 May 03 08:40:59 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 03 08:41:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a95b25b7-8f79-5668-91fe-fc9eca1a25b2 (at 10.8.26.4@o2ib6) in 211 seconds. I think it's dead, and I am evicting it. exp ffff9847dc677800, cur 1556898079 expire 1556897929 last 1556897868 May 03 08:41:19 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 03 08:41:49 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 627e76d3-c7eb-8d97-a64e-cb771e022cc0 (at 10.8.26.33@o2ib6) May 03 08:41:49 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 08:43:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 03 08:43:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 08:45:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client aac82ab8-0a52-2812-c1ac-bb4f1699e373 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984b70710800, cur 1556898350 expire 1556898200 last 1556898123 May 03 08:45:50 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 03 08:48:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 03 08:48:46 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 08:52:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 14afde0d-64e4-8d1e-4b0f-d9b103c23288 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983c43f02400, cur 1556898749 expire 1556898599 last 1556898522 May 03 08:52:29 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 08:53:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 03 08:53:33 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 08:57:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.10.29@o2ib6) May 03 08:57:09 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 09:00:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 2c462a18-098e-7a73-a679-30c3eb8b507d (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9840a7f8ec00, cur 1556899208 expire 1556899058 last 1556898981 May 03 09:00:08 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 03 09:04:50 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 03 09:04:50 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 03 09:06:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 5858d1f9-928b-9b3d-4cd2-ed8b98db075a (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98220a8c2000, cur 1556899560 expire 1556899410 last 1556899333 May 03 09:06:00 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 03 09:08:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client dd2bd823-a9e8-ea1a-1912-9791d31c41d7 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9823935c8800, cur 1556899729 expire 1556899579 last 1556899502 May 03 09:08:49 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 09:17:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client aeea8dfb-dc0f-143b-04d1-34efebe3c6b0 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982bee36a800, cur 1556900279 expire 1556900129 last 1556900052 May 03 09:17:59 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 03 09:21:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.10.29@o2ib6) May 03 09:21:06 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages May 03 09:28:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 4c64004d-4d28-6daa-a2d1-0bea1ed1377c (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9839e2ad3c00, cur 1556900936 expire 1556900786 last 1556900709 May 03 09:28:56 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages May 03 09:34:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 03 09:34:35 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages May 03 09:40:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f6aa1d23-fbe2-bdc2-449f-61ee24385db5 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9848d03b6000, cur 1556901631 expire 1556901481 last 1556901404 May 03 09:40:31 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages May 03 09:48:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 03 09:48:51 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages May 03 09:53:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client c65c7edb-9621-670d-78b0-b6fe77ae1c53 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982b2f2c1c00, cur 1556902436 expire 1556902286 last 1556902209 May 03 09:53:56 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 03 10:04:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.10.29@o2ib6) May 03 10:04:00 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 03 10:13:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 4ea3c528-b342-aa1e-412a-672520eef780 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9841f5299c00, cur 1556903596 expire 1556903446 last 1556903369 May 03 10:13:16 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages May 03 10:15:58 fir-md1-s1 kernel: Lustre: 102446:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 May 03 10:15:58 fir-md1-s1 kernel: Lustre: 102446:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 13 previous similar messages May 03 10:17:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.10.29@o2ib6) May 03 10:17:06 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 03 10:25:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d9c19008-9827-e671-67a3-a52e4344f679 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983b97f6b800, cur 1556904357 expire 1556904207 last 1556904130 May 03 10:25:57 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 10:26:55 fir-md1-s1 kernel: Lustre: 102564:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556904408/real 1556904408] req@ffff9826bc2c9500 x1632281601957056/t0(0) o106->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556904415 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 03 10:26:55 fir-md1-s1 kernel: Lustre: 102564:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 17 previous similar messages May 03 10:27:03 fir-md1-s1 kernel: Lustre: 102691:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9821f3a07500 x1632093077847872/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:8/0 lens 480/568 e 1 to 0 dl 1556904428 ref 2 fl Interpret:/0/0 rc 0/0 May 03 10:27:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 03 10:27:09 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 03 10:27:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.0.10.3@o2ib7) May 03 10:27:09 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 10:27:16 fir-md1-s1 kernel: Lustre: 102564:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556904429/real 1556904429] req@ffff9826bc2c9500 x1632281601957056/t0(0) o106->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556904436 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 03 10:27:16 fir-md1-s1 kernel: Lustre: 102564:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 03 10:27:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 03 10:27:51 fir-md1-s1 kernel: Lustre: 102564:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556904464/real 1556904464] req@ffff9826bc2c9500 x1632281601957056/t0(0) o106->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556904471 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 03 10:27:51 fir-md1-s1 kernel: Lustre: 102564:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages May 03 10:27:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 03 10:28:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 03 10:29:01 fir-md1-s1 kernel: Lustre: 102564:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556904534/real 1556904534] req@ffff9826bc2c9500 x1632281601957056/t0(0) o106->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556904541 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 03 10:29:01 fir-md1-s1 kernel: Lustre: 102564:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages May 03 10:29:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 03 10:29:37 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 10:30:09 fir-md1-s1 kernel: LNet: Service thread pid 102564 was inactive for 200.64s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: May 03 10:30:09 fir-md1-s1 kernel: Pid: 102564, comm: mdt00_067 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 May 03 10:30:09 fir-md1-s1 kernel: Call Trace: May 03 10:30:09 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x500/0x8d0 [ptlrpc] May 03 10:30:09 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] May 03 10:30:09 fir-md1-s1 kernel: [] ldlm_glimpse_locks+0x3b/0x100 [ptlrpc] May 03 10:30:09 fir-md1-s1 kernel: [] mdt_do_glimpse+0x1e9/0x4c0 [mdt] May 03 10:30:09 fir-md1-s1 kernel: [] mdt_glimpse_enqueue+0x3d3/0x4f0 [mdt] May 03 10:30:09 fir-md1-s1 kernel: [] mdt_intent_glimpse+0x1f/0x30 [mdt] May 03 10:30:09 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] May 03 10:30:09 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] May 03 10:30:09 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] May 03 10:30:09 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] May 03 10:30:09 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] May 03 10:30:09 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 03 10:30:09 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] May 03 10:30:09 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 May 03 10:30:09 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 May 03 10:30:09 fir-md1-s1 kernel: [] 0xffffffffffffffff May 03 10:30:09 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556904609.102564 May 03 10:30:26 fir-md1-s1 kernel: LNet: Service thread pid 102564 completed after 217.16s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). May 03 10:41:41 fir-md1-s1 kernel: Lustre: 102352:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556905290/real 1556905290] req@ffff9842da2d4e00 x1632281787046640/t0(0) o104->fir-MDT0000@10.8.10.29@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556905301 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 03 10:41:41 fir-md1-s1 kernel: Lustre: 102352:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 12 previous similar messages May 03 10:41:45 fir-md1-s1 kernel: Lustre: 102692:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff98418c2bd100 x1631221893613920/t0(0) o36->74b14fbc-b3cc-e771-44b0-23517ef5c46c@10.9.0.1@o2ib4:20/0 lens 496/448 e 1 to 0 dl 1556905310 ref 2 fl Interpret:/0/0 rc 0/0 May 03 10:41:45 fir-md1-s1 kernel: Lustre: 102692:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 125 previous similar messages May 03 10:41:46 fir-md1-s1 kernel: Lustre: 102375:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff984cfaa7b300 x1631657932769248/t0(0) o101->bfaf32fd-a75c-1493-838b-c2682e1a6ae6@10.9.101.15@o2ib4:21/0 lens 576/3264 e 1 to 0 dl 1556905311 ref 2 fl Interpret:/0/0 rc 0/0 May 03 10:41:46 fir-md1-s1 kernel: Lustre: 102375:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 104 previous similar messages May 03 10:41:47 fir-md1-s1 kernel: Lustre: 102375:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff984204a65700 x1631536717739072/t0(0) o101->932beadc-171a-aad1-eb40-5517bfcf1974@10.8.15.9@o2ib6:22/0 lens 576/3264 e 1 to 0 dl 1556905312 ref 2 fl Interpret:/0/0 rc 0/0 May 03 10:41:47 fir-md1-s1 kernel: Lustre: 102375:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 74 previous similar messages May 03 10:41:49 fir-md1-s1 kernel: Lustre: 102375:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9840b465b300 x1631718726386816/t0(0) o101->e7d8fb90-a64f-4ffd-eb18-9c05d19d0585@10.9.102.17@o2ib4:24/0 lens 576/3264 e 1 to 0 dl 1556905314 ref 2 fl Interpret:/0/0 rc 0/0 May 03 10:41:49 fir-md1-s1 kernel: Lustre: 102375:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 108 previous similar messages May 03 10:41:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 74b14fbc-b3cc-e771-44b0-23517ef5c46c (at 10.9.0.1@o2ib4) reconnecting May 03 10:41:51 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 10:41:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.9.0.1@o2ib4) May 03 10:41:51 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages May 03 10:41:53 fir-md1-s1 kernel: Lustre: 102375:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9846b4371500 x1631535044544912/t0(0) o101->f204b046-84eb-2847-0976-09e82296051e@10.9.109.2@o2ib4:28/0 lens 1768/0 e 1 to 0 dl 1556905318 ref 2 fl New:/0/ffffffff rc 0/-1 May 03 10:41:53 fir-md1-s1 kernel: Lustre: 102375:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 135 previous similar messages May 03 10:41:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 028e0b5d-b19e-c19b-b3ed-6fa708300cb4 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9838dc605c00, cur 1556905317 expire 1556905167 last 1556905090 May 03 10:41:57 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 03 10:41:58 fir-md1-s1 kernel: Lustre: 102471:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:2s); client may timeout. req@ffff9840b465f200 x1631566793099248/t0(0) o101->091aba9e-6e6a-d0c2-b28a-6c100b32f339@10.8.30.22@o2ib6:26/0 lens 576/0 e 1 to 0 dl 1556905316 ref 1 fl Interpret:/0/ffffffff rc 0/-1 May 03 10:41:58 fir-md1-s1 kernel: LustreError: 102457:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.108.23@o2ib4: deadline 20:2s ago req@ffff9840b465a700 x1631543916213232/t0(0) o101->dbdaea0c-6cd1-174f-909a-de8e042f528b@10.9.108.23@o2ib4:26/0 lens 1768/0 e 1 to 0 dl 1556905316 ref 1 fl Interpret:/0/ffffffff rc 0/-1 May 03 10:41:58 fir-md1-s1 kernel: LustreError: 102457:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 14 previous similar messages May 03 10:41:58 fir-md1-s1 kernel: Lustre: 102471:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 100 previous similar messages May 03 10:46:02 fir-md1-s1 kernel: Lustre: 101915:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556905555/real 1556905555] req@ffff9831f2fab900 x1632281863536608/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556905562 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 03 10:46:02 fir-md1-s1 kernel: Lustre: 101915:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 03 10:46:20 fir-md1-s1 kernel: Lustre: 102437:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff9821a05a8000 x1631546738361024/t0(0) o101->2e1cbf07-8a1f-f8ac-959f-a318bdca8802@10.9.105.18@o2ib4:25/0 lens 480/568 e 0 to 0 dl 1556905585 ref 2 fl Interpret:/0/0 rc 0/0 May 03 10:46:20 fir-md1-s1 kernel: Lustre: 102437:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 184 previous similar messages May 03 10:46:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 2e1cbf07-8a1f-f8ac-959f-a318bdca8802 (at 10.9.105.18@o2ib4) reconnecting May 03 10:46:26 fir-md1-s1 kernel: Lustre: Skipped 343 previous similar messages May 03 10:46:37 fir-md1-s1 kernel: Lustre: 102496:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556905590/real 1556905590] req@ffff9823f0474e00 x1632281869195024/t0(0) o106->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556905597 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 03 10:46:37 fir-md1-s1 kernel: Lustre: 102496:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 6 previous similar messages May 03 10:46:41 fir-md1-s1 kernel: Lustre: 102537:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff9822d298b900 x1631565302539024/t0(0) o101->7cc0f019-7fa6-17b1-76f1-8ecb3c84ba82@10.8.27.24@o2ib6:16/0 lens 480/568 e 0 to 0 dl 1556905606 ref 2 fl Interpret:/0/0 rc 0/0 May 03 10:47:47 fir-md1-s1 kernel: Lustre: 102496:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556905660/real 1556905660] req@ffff9823f0474e00 x1632281869195024/t0(0) o106->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556905667 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 03 10:47:47 fir-md1-s1 kernel: Lustre: 102496:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 19 previous similar messages May 03 10:51:59 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client a5ba3c6b-5824-c9ce-378e-cb79cb25b991 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9848aca76400, cur 1556905919 expire 1556905769 last 1556905692 May 03 10:51:59 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 03 10:52:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 03 10:52:11 fir-md1-s1 kernel: Lustre: Skipped 354 previous similar messages May 03 11:00:16 fir-md1-s1 kernel: Lustre: 101920:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556906409/real 1556906409] req@ffff982795f8a400 x1632282103495984/t0(0) o106->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556906416 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 03 11:00:16 fir-md1-s1 kernel: Lustre: 101920:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages May 03 11:00:24 fir-md1-s1 kernel: Lustre: 102394:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff984658bfe000 x1632093238091856/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:29/0 lens 480/568 e 1 to 0 dl 1556906429 ref 2 fl Interpret:/0/0 rc 0/0 May 03 11:00:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 03 11:00:30 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 03 11:00:37 fir-md1-s1 kernel: Lustre: 102651:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556906430/real 1556906430] req@ffff982776a2aa00 x1632282103496192/t0(0) o106->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556906437 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 03 11:00:37 fir-md1-s1 kernel: Lustre: 102651:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 03 11:14:57 fir-md1-s1 kernel: LNetError: 101315:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 03 11:16:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 0395c81f-44b7-1d8e-ce26-0bbaedce981e (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9858cef61000, cur 1556907415 expire 1556907265 last 1556907188 May 03 11:16:55 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 03 11:18:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.11.9@o2ib6) May 03 11:18:33 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages May 03 11:32:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b63d5e73-9cf1-a4e3-05c5-0e6123396f5a (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9822c20fc400, cur 1556908370 expire 1556908220 last 1556908143 May 03 11:32:50 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 11:35:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.11.9@o2ib6) May 03 11:35:08 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 11:45:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e9b10532-56af-da7e-0d06-e7566a35c264 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984ccce43800, cur 1556909139 expire 1556908989 last 1556908912 May 03 11:45:39 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 11:47:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.11.9@o2ib6) May 03 11:47:34 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 11:52:32 fir-md1-s1 kernel: Lustre: 102519:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 May 03 11:52:32 fir-md1-s1 kernel: Lustre: 102519:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 19 previous similar messages May 03 11:54:34 fir-md1-s1 kernel: Lustre: 102963:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9821b6526c50 x1631552165377120/t0(0) o3->aea5ba3f-2c7e-6072-5c51-f17a66d3e2f1@10.9.115.1@o2ib4:9/0 lens 488/8632 e 1 to 0 dl 1556909679 ref 2 fl Interpret:/0/0 rc 0/0 May 03 11:54:34 fir-md1-s1 kernel: Lustre: 102963:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 03 12:08:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b97f5529-05a6-671b-124b-7d37b1307439 (at 10.8.30.25@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985c307d0c00, cur 1556910531 expire 1556910381 last 1556910304 May 03 12:08:51 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 12:08:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client b97f5529-05a6-671b-124b-7d37b1307439 (at 10.8.30.25@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985c8ab69c00, cur 1556910537 expire 1556910387 last 1556910310 May 03 12:08:57 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 03 12:09:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7f7fa514-be7d-b6c7-4718-4b4427ec7cee (at 10.8.30.25@o2ib6) May 03 12:09:11 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 12:14:03 fir-md1-s1 kernel: Lustre: 102519:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556910836/real 1556910836] req@ffff981cea3e4e00 x1632283274566896/t0(0) o106->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556910843 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 03 12:14:03 fir-md1-s1 kernel: Lustre: 102519:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 6 previous similar messages May 03 12:14:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 076f14de-7dac-4951-1807-4b0246581885 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98393926c400, cur 1556910848 expire 1556910698 last 1556910621 May 03 12:14:10 fir-md1-s1 kernel: Lustre: 102519:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556910843/real 1556910843] req@ffff981cea3e4e00 x1632283274566896/t0(0) o106->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556910850 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 03 12:14:11 fir-md1-s1 kernel: Lustre: 102700:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff981cc7e28900 x1632093647085568/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:16/0 lens 480/568 e 1 to 0 dl 1556910856 ref 2 fl Interpret:/0/0 rc 0/0 May 03 12:14:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 076f14de-7dac-4951-1807-4b0246581885 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984818fd4800, cur 1556910853 expire 1556910703 last 1556910626 May 03 12:14:13 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 03 12:15:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 03 12:15:00 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 12:15:16 fir-md1-s1 kernel: Lustre: 102707:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 May 03 12:15:19 fir-md1-s1 kernel: Lustre: 102634:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 May 03 12:15:30 fir-md1-s1 kernel: Lustre: 102698:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 May 03 12:15:46 fir-md1-s1 kernel: Lustre: 102754:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 May 03 12:16:47 fir-md1-s1 kernel: Lustre: 102754:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 May 03 12:16:47 fir-md1-s1 kernel: Lustre: 102754:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3 previous similar messages May 03 12:16:58 fir-md1-s1 kernel: Lustre: 102628:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 May 03 12:21:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 0e30e826-5440-f115-6784-21980943bb7a (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98373621d000, cur 1556911315 expire 1556911165 last 1556911088 May 03 12:23:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.11.9@o2ib6) May 03 12:23:43 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 13:01:38 fir-md1-s1 kernel: Lustre: 102446:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556913691/real 1556913691] req@ffff9825f3270c00 x1632284042972496/t0(0) o106->fir-MDT0002@10.8.11.9@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556913698 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 03 13:01:38 fir-md1-s1 kernel: Lustre: 102446:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 03 13:01:45 fir-md1-s1 kernel: Lustre: 101919:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556913698/real 1556913698] req@ffff98229538b300 x1632284042972720/t0(0) o106->fir-MDT0002@10.8.11.9@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556913705 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 03 13:01:46 fir-md1-s1 kernel: Lustre: 102537:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff98216a2bd100 x1632093980846144/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:21/0 lens 480/568 e 1 to 0 dl 1556913711 ref 2 fl Interpret:/0/0 rc 0/0 May 03 13:01:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client cb69cd2d-93d3-bee1-db21-fe2b3f3ca9c6 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9839932ea400, cur 1556913710 expire 1556913560 last 1556913483 May 03 13:01:50 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 13:03:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.11.9@o2ib6) May 03 13:03:47 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 13:09:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f177c1d4-235a-5dc4-a6f4-a44cfc44c463 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985cd5090000, cur 1556914195 expire 1556914045 last 1556913968 May 03 13:09:55 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 13:12:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 03 13:12:07 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 13:30:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 3aedce70-4bf1-4ae8-3845-eb0f10d1fc87 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984cb2708400, cur 1556915435 expire 1556915285 last 1556915208 May 03 13:30:35 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 13:30:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 3aedce70-4bf1-4ae8-3845-eb0f10d1fc87 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9838a9b60000, cur 1556915439 expire 1556915289 last 1556915212 May 03 13:30:39 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 03 13:31:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 03 13:31:13 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 13:31:17 fir-md1-s1 kernel: Lustre: 102427:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556915470/real 1556915470] req@ffff98545b3a4200 x1632284516090096/t0(0) o106->fir-MDT0002@10.8.11.9@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556915477 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 03 13:31:17 fir-md1-s1 kernel: Lustre: 102663:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556915470/real 1556915470] req@ffff98267fe6b000 x1632284516089696/t0(0) o106->fir-MDT0002@10.8.11.9@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556915477 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 03 13:31:17 fir-md1-s1 kernel: Lustre: 102663:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 03 13:31:24 fir-md1-s1 kernel: Lustre: 102663:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556915477/real 1556915477] req@ffff98267fe6b000 x1632284516089696/t0(0) o106->fir-MDT0002@10.8.11.9@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556915484 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 03 13:31:25 fir-md1-s1 kernel: Lustre: 102493:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9822c8aa2a00 x1632094058617408/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:0/0 lens 480/568 e 1 to 0 dl 1556915490 ref 2 fl Interpret:/0/0 rc 0/0 May 03 13:31:25 fir-md1-s1 kernel: Lustre: 102493:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 03 13:31:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 03 13:31:31 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 03 13:31:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 03 13:31:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 13:31:31 fir-md1-s1 kernel: Lustre: 102427:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556915484/real 1556915484] req@ffff98545b3a4200 x1632284516090096/t0(0) o106->fir-MDT0002@10.8.11.9@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556915491 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 03 13:31:31 fir-md1-s1 kernel: Lustre: 102427:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 03 13:31:38 fir-md1-s1 kernel: Lustre: 102663:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556915491/real 1556915491] req@ffff98267fe6b000 x1632284516089696/t0(0) o106->fir-MDT0002@10.8.11.9@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556915498 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 03 13:31:45 fir-md1-s1 kernel: Lustre: 102663:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556915498/real 1556915498] req@ffff98267fe6b000 x1632284516089696/t0(0) o106->fir-MDT0002@10.8.11.9@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556915505 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 03 13:31:45 fir-md1-s1 kernel: Lustre: 102663:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 03 13:31:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d58d6902-155e-b288-aeec-34f0613953b8 (at 10.8.11.9@o2ib6) in 176 seconds. I think it's dead, and I am evicting it. exp ffff984669e11c00, cur 1556915511 expire 1556915361 last 1556915335 May 03 13:31:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 03 13:31:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client d58d6902-155e-b288-aeec-34f0613953b8 (at 10.8.11.9@o2ib6) in 180 seconds. I think it's dead, and I am evicting it. exp ffff983a0c259800, cur 1556915515 expire 1556915365 last 1556915335 May 03 13:32:42 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client ca4794ff-a60b-5b4d-2794-7451fd07ead5 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff981d0ad80c00, cur 1556915562 expire 1556915412 last 1556915335 May 03 13:34:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.11.9@o2ib6) May 03 13:39:57 fir-md1-s1 kernel: Lustre: 102493:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556915990/real 1556915990] req@ffff9823a1e0da00 x1632284660947248/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556915997 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 03 13:39:57 fir-md1-s1 kernel: Lustre: 102493:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 03 13:40:05 fir-md1-s1 kernel: Lustre: 102557:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff98240461a400 x1631920565432816/t0(0) o101->81ba03cb-100f-75de-7e7c-84a709257c80@10.9.108.7@o2ib4:10/0 lens 480/568 e 1 to 0 dl 1556916010 ref 2 fl Interpret:/0/0 rc 0/0 May 03 13:40:05 fir-md1-s1 kernel: Lustre: 102557:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 03 13:40:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 81ba03cb-100f-75de-7e7c-84a709257c80 (at 10.9.108.7@o2ib4) reconnecting May 03 13:40:11 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 03 13:40:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.108.7@o2ib4) May 03 13:40:11 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 13:40:18 fir-md1-s1 kernel: Lustre: 102493:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556916011/real 1556916011] req@ffff9823a1e0da00 x1632284660947248/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556916018 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 03 13:40:18 fir-md1-s1 kernel: Lustre: 102493:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 03 13:40:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f288cf55-7f62-a214-4a98-fa3df403c209 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983c05248400, cur 1556916021 expire 1556915871 last 1556915794 May 03 13:40:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 03 13:46:43 fir-md1-s1 kernel: Lustre: 101684:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff982c6f24b600 x1632094097275376/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:18/0 lens 480/568 e 0 to 0 dl 1556916408 ref 2 fl Interpret:/0/0 rc 0/0 May 03 13:46:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 03 13:46:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.0.10.3@o2ib7) May 03 13:46:49 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 13:46:49 fir-md1-s1 kernel: Lustre: 102700:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556916378/real 1556916378] req@ffff98208a340600 x1632284775491248/t0(0) o106->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556916409 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 03 13:46:49 fir-md1-s1 kernel: Lustre: 102628:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556916378/real 1556916378] req@ffff9821fdb12700 x1632284775491104/t0(0) o106->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556916409 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 03 13:47:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.0.10.3@o2ib7) May 03 13:47:51 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 03 13:49:24 fir-md1-s1 kernel: Lustre: 102628:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556916533/real 1556916533] req@ffff9821fdb12700 x1632284775491104/t0(0) o106->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556916564 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 03 13:49:24 fir-md1-s1 kernel: Lustre: 102628:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 8 previous similar messages May 03 13:49:24 fir-md1-s1 kernel: LustreError: 102628:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.27.23@o2ib6) returned error from glimpse AST (req@ffff9821fdb12700 x1632284775491104 status -107 rc -107), evict it ns: mdt-fir-MDT0000_UUID lock: ffff9831ecfe18c0/0xce885391ec839a1a lrc: 4/0,0 mode: PW/PW res: [0x20002202b:0x1:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x40200000000000 nid: 10.8.27.23@o2ib6 remote: 0xd1b0d5fa7e3e980d expref: 65 pid: 102500 timeout: 0 lvb_type: 0 May 03 13:49:24 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.8.27.23@o2ib6 was evicted due to a lock glimpse callback time out: rc -107 May 03 13:49:24 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 481s: evicting client at 10.8.27.23@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff982a1fb886c0/0xce885391ec95a008 lrc: 4/0,0 mode: PW/PW res: [0x20002202b:0x2:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x40200000000000 nid: 10.8.27.23@o2ib6 remote: 0xd1b0d5fa7e3e9a21 expref: 66 pid: 102535 timeout: 0 lvb_type: 0 May 03 13:49:24 fir-md1-s1 kernel: Lustre: 102700:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (185:1s); client may timeout. req@ffff982c6f24b600 x1632094097275376/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:18/0 lens 480/536 e 0 to 0 dl 1556916563 ref 1 fl Complete:/0/0 rc 301/301 May 03 13:49:24 fir-md1-s1 kernel: LustreError: 102628:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) Skipped 1 previous similar message May 03 13:49:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 2d89ecf0-6152-5faf-bd31-446a85cf5283 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982cd9496400, cur 1556916574 expire 1556916424 last 1556916347 May 03 13:49:34 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 13:50:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 03 13:50:19 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 14:08:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.101.63@o2ib4, removing former export from same NID May 03 14:08:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 75a42419-1c36-3d84-69b0-0982bb5ad919 (at 10.9.101.63@o2ib4) reconnecting May 03 14:08:50 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages May 03 14:08:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 586563e7-23da-b7c7-408b-81f7cf74781b (at 10.9.101.63@o2ib4) May 03 14:08:50 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 14:09:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.101.63@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. May 03 14:09:21 fir-md1-s1 kernel: LustreError: Skipped 4 previous similar messages May 03 14:16:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 75690e50-9cef-96d9-35c3-91f964088184 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983ce4b95c00, cur 1556918162 expire 1556918012 last 1556917935 May 03 14:16:02 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 03 14:16:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 75690e50-9cef-96d9-35c3-91f964088184 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9849d7791c00, cur 1556918175 expire 1556918025 last 1556917948 May 03 14:16:15 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 03 14:16:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 03 14:16:37 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 14:18:30 fir-md1-s1 kernel: Lustre: 102481:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556918303/real 1556918303] req@ffff98208a347b00 x1632285345540208/t0(0) o106->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556918310 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 03 14:18:30 fir-md1-s1 kernel: Lustre: 102481:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 03 14:18:38 fir-md1-s1 kernel: Lustre: 101920:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff98214e68dd00 x1632094169247840/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:13/0 lens 480/568 e 1 to 0 dl 1556918323 ref 2 fl Interpret:/0/0 rc 0/0 May 03 14:18:38 fir-md1-s1 kernel: Lustre: 101920:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 03 14:18:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 03 14:18:44 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 03 14:18:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.0.10.3@o2ib7) May 03 14:18:44 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 14:19:05 fir-md1-s1 kernel: Lustre: 102481:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556918338/real 1556918338] req@ffff98208a347b00 x1632285345540208/t0(0) o106->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556918345 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 03 14:19:05 fir-md1-s1 kernel: Lustre: 102481:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages May 03 14:19:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 3e2d5c79-3c29-4573-80fd-0479f8ebef28 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983bcdf7f000, cur 1556918351 expire 1556918201 last 1556918124 May 03 14:26:01 fir-md1-s1 kernel: Lustre: 102459:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff981ccfcc8600 x1632094190168768/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:6/0 lens 480/568 e 0 to 0 dl 1556918766 ref 2 fl Interpret:/0/0 rc 0/0 May 03 14:26:07 fir-md1-s1 kernel: Lustre: 101920:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556918736/real 1556918736] req@ffff98239d229500 x1632285469420256/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556918767 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 03 14:26:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 03 14:26:07 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 03 14:26:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 03 14:26:07 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages May 03 14:26:09 fir-md1-s1 kernel: LustreError: 101920:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.27.23@o2ib6) returned error from glimpse AST (req@ffff98239d229500 x1632285469420256 status -107 rc -107), evict it ns: mdt-fir-MDT0002_UUID lock: ffff98268dfee540/0xce88539208d8ca62 lrc: 4/0,0 mode: PW/PW res: [0x2c0023ff4:0x1:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x40200000000000 nid: 10.8.27.23@o2ib6 remote: 0xdfefce4c7a8020cd expref: 14 pid: 102437 timeout: 0 lvb_type: 0 May 03 14:26:09 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.27.23@o2ib6 was evicted due to a lock glimpse callback time out: rc -107 May 03 14:26:09 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message May 03 14:26:09 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 255s: evicting client at 10.8.27.23@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff98268dfee540/0xce88539208d8ca62 lrc: 5/0,0 mode: PW/PW res: [0x2c0023ff4:0x1:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x40200000000000 nid: 10.8.27.23@o2ib6 remote: 0xdfefce4c7a8020cd expref: 15 pid: 102437 timeout: 0 lvb_type: 0 May 03 14:26:14 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 5abe3554-6aaf-e73d-7fd8-165fdb2eebd3 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98276a321000, cur 1556918774 expire 1556918624 last 1556918547 May 03 14:26:14 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 14:28:30 fir-md1-s1 kernel: Lustre: 102493:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556918903/real 1556918903] req@ffff98208a340900 x1632285519993600/t0(0) o106->fir-MDT0002@10.8.10.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556918910 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 03 14:28:30 fir-md1-s1 kernel: Lustre: 102493:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 03 14:28:48 fir-md1-s1 kernel: Lustre: 101913:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff9822cba9b300 x1632094196044080/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:23/0 lens 480/568 e 0 to 0 dl 1556918933 ref 2 fl Interpret:/0/0 rc 0/0 May 03 14:28:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 03 14:28:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 03 14:28:54 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages May 03 14:29:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 946a91e3-4e18-fe36-43ed-463403d1cb75 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9839fbe7ec00, cur 1556918997 expire 1556918847 last 1556918770 May 03 14:38:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 03 14:38:20 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 03 14:38:29 fir-md1-s1 kernel: LustreError: 102537:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.27.23@o2ib6) returned error from glimpse AST (req@ffff9822fec99200 x1632285694951488 status -107 rc -107), evict it ns: mdt-fir-MDT0000_UUID lock: ffff98366da0e780/0xce8853921024b00f lrc: 4/0,0 mode: PW/PW res: [0x200022031:0x2:0x0].0x0 bits 0x40/0x0 rrc: 6 type: IBT flags: 0x40200000000000 nid: 10.8.27.23@o2ib6 remote: 0x9ea6573680094512 expref: 69 pid: 102693 timeout: 0 lvb_type: 0 May 03 14:38:29 fir-md1-s1 kernel: LustreError: 102537:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) Skipped 2 previous similar messages May 03 14:38:29 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.8.27.23@o2ib6 was evicted due to a lock glimpse callback time out: rc -107 May 03 14:38:29 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages May 03 14:38:29 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 360s: evicting client at 10.8.27.23@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff98366da0e780/0xce8853921024b00f lrc: 4/0,0 mode: PW/PW res: [0x200022031:0x2:0x0].0x0 bits 0x40/0x0 rrc: 6 type: IBT flags: 0x40200000000000 nid: 10.8.27.23@o2ib6 remote: 0x9ea6573680094512 expref: 70 pid: 102693 timeout: 0 lvb_type: 0 May 03 14:38:29 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 2 previous similar messages May 03 14:38:31 fir-md1-s1 kernel: LustreError: 102532:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.27.23@o2ib6) returned error from glimpse AST (req@ffff981cca758900 x1632285695717472 status -107 rc -107), evict it ns: mdt-fir-MDT0002_UUID lock: ffff983be3229200/0xce8853921092a6ab lrc: 4/0,0 mode: PW/PW res: [0x2c0023ff2:0x3:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x40200000000000 nid: 10.8.27.23@o2ib6 remote: 0x9ea6573680094910 expref: 225 pid: 102360 timeout: 0 lvb_type: 0 May 03 14:38:31 fir-md1-s1 kernel: LustreError: 102532:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) Skipped 1 previous similar message May 03 14:38:31 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.27.23@o2ib6 was evicted due to a lock glimpse callback time out: rc -107 May 03 14:38:31 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message May 03 14:38:31 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 325s: evicting client at 10.8.27.23@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff983be3229200/0xce8853921092a6ab lrc: 4/0,0 mode: PW/PW res: [0x2c0023ff2:0x3:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x40200000000000 nid: 10.8.27.23@o2ib6 remote: 0x9ea6573680094910 expref: 226 pid: 102360 timeout: 0 lvb_type: 0 May 03 14:39:11 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client d4fcea37-38be-0da3-ff2f-ff5f0dc607f2 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982ec52ee400, cur 1556919551 expire 1556919401 last 1556919324 May 03 14:39:11 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 14:40:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 9c0c610d-0561-9b9a-98c4-0ad0384caf27 (at 10.9.101.51@o2ib4) reconnecting May 03 14:40:01 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 14:40:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.101.51@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. May 03 14:41:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.101.63@o2ib4, removing former export from same NID May 03 14:42:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.101.51@o2ib4, removing former export from same NID May 03 14:42:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.101.51@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. May 03 14:45:24 fir-md1-s1 kernel: Lustre: 102537:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556919917/real 1556919917] req@ffff982202881500 x1632285817027088/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556919924 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 03 14:45:24 fir-md1-s1 kernel: Lustre: 102537:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 24 previous similar messages May 03 14:45:32 fir-md1-s1 kernel: Lustre: 102481:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9823de779b00 x1632094230654160/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:7/0 lens 480/568 e 1 to 0 dl 1556919937 ref 2 fl Interpret:/0/0 rc 0/0 May 03 14:45:32 fir-md1-s1 kernel: Lustre: 102481:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 03 14:45:59 fir-md1-s1 kernel: Lustre: 102537:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556919952/real 1556919952] req@ffff982202881500 x1632285817027088/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556919959 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 03 14:45:59 fir-md1-s1 kernel: Lustre: 102537:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages May 03 14:46:14 fir-md1-s1 kernel: LustreError: 102537:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.27.23@o2ib6) returned error from glimpse AST (req@ffff982202881500 x1632285817027088 status -107 rc -107), evict it ns: mdt-fir-MDT0002_UUID lock: ffff981f4c6c0900/0xce8853921589f0a1 lrc: 4/0,0 mode: PW/PW res: [0x2c0023ff3:0x3:0x0].0x0 bits 0x40/0x0 rrc: 10 type: IBT flags: 0x40200000000000 nid: 10.8.27.23@o2ib6 remote: 0xeff6fef991f5804d expref: 17 pid: 102628 timeout: 0 lvb_type: 0 May 03 14:46:14 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.27.23@o2ib6 was evicted due to a lock glimpse callback time out: rc -107 May 03 14:46:14 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 358s: evicting client at 10.8.27.23@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff981f4c6c0900/0xce8853921589f0a1 lrc: 4/0,0 mode: PW/PW res: [0x2c0023ff3:0x3:0x0].0x0 bits 0x40/0x0 rrc: 10 type: IBT flags: 0x40200000000000 nid: 10.8.27.23@o2ib6 remote: 0xeff6fef991f5804d expref: 18 pid: 102628 timeout: 0 lvb_type: 0 May 03 14:46:19 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 193f5aaf-6471-33be-9034-a8acdf85cb2d (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982720406400, cur 1556919979 expire 1556919829 last 1556919752 May 03 14:56:47 fir-md1-s1 kernel: Lustre: 102485:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982096f12100 x1632094251491536/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:22/0 lens 480/568 e 1 to 0 dl 1556920612 ref 2 fl Interpret:/0/0 rc 0/0 May 03 14:56:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 03 14:56:53 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages May 03 14:56:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.0.10.3@o2ib7) May 03 14:56:53 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages May 03 14:57:03 fir-md1-s1 kernel: Lustre: 101905:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556920592/real 1556920592] req@ffff982b4f9c2400 x1632286020101968/t0(0) o106->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556920623 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 03 14:57:03 fir-md1-s1 kernel: Lustre: 101905:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 03 14:58:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 3a7b3379-8114-b28e-8d07-28bf2279cfcb (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983945e1f400, cur 1556920731 expire 1556920581 last 1556920504 May 03 14:58:59 fir-md1-s1 kernel: Lustre: 101905:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (146:1s); client may timeout. req@ffff982096f12100 x1632094251491536/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:22/0 lens 480/536 e 1 to 0 dl 1556920738 ref 1 fl Complete:/0/0 rc 301/301 May 03 14:58:59 fir-md1-s1 kernel: Lustre: 101905:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message May 03 15:07:51 fir-md1-s1 kernel: Lustre: 102758:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556921264/real 1556921264] req@ffff982195cb0f00 x1632286230750128/t0(0) o106->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556921271 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 03 15:07:51 fir-md1-s1 kernel: Lustre: 102758:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages May 03 15:08:09 fir-md1-s1 kernel: Lustre: 102532:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff9821a6bf6300 x1632094275387120/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:14/0 lens 480/568 e 0 to 0 dl 1556921294 ref 2 fl Interpret:/0/0 rc 0/0 May 03 15:08:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 03 15:08:16 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 03 15:08:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.0.10.3@o2ib7) May 03 15:08:16 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages May 03 15:10:08 fir-md1-s1 kernel: Lustre: 102557:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982a92a2ce00 x1631221895811712/t0(0) o36->74b14fbc-b3cc-e771-44b0-23517ef5c46c@10.9.0.1@o2ib4:13/0 lens 496/448 e 1 to 0 dl 1556921413 ref 2 fl Interpret:/0/0 rc 0/0 May 03 15:10:09 fir-md1-s1 kernel: Lustre: 102493:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982216ff0c00 x1631547202849728/t0(0) o101->aba5d4eb-e07c-9b0f-6ab5-7f97caf38a26@10.8.16.4@o2ib6:14/0 lens 576/3264 e 1 to 0 dl 1556921414 ref 2 fl Interpret:/0/0 rc 0/0 May 03 15:10:09 fir-md1-s1 kernel: Lustre: 102493:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 74 previous similar messages May 03 15:10:11 fir-md1-s1 kernel: Lustre: 102490:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff983b7e26e900 x1631546420695248/t0(0) o101->d9dead47-18af-0bad-b841-6c3aac9d942a@10.8.28.7@o2ib6:16/0 lens 576/3264 e 1 to 0 dl 1556921416 ref 2 fl Interpret:/0/0 rc 0/0 May 03 15:10:11 fir-md1-s1 kernel: Lustre: 102490:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 21 previous similar messages May 03 15:10:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client c8ba4227-c835-d772-68cd-176ab2dba0c2 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff981cf8a87800, cur 1556921413 expire 1556921263 last 1556921186 May 03 15:10:13 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 15:15:31 fir-md1-s1 kernel: Lustre: 102707:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556921724/real 1556921724] req@ffff9822bbfb8300 x1632286376752512/t0(0) o106->fir-MDT0002@10.8.10.9@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556921731 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 03 15:15:31 fir-md1-s1 kernel: Lustre: 102651:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556921724/real 1556921724] req@ffff98221f4cad00 x1632286376752896/t0(0) o106->fir-MDT0002@10.8.10.9@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556921731 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 03 15:15:31 fir-md1-s1 kernel: Lustre: 102651:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 24 previous similar messages May 03 15:15:49 fir-md1-s1 kernel: Lustre: 102496:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff9822d19a8300 x1632094296309488/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:24/0 lens 480/568 e 0 to 0 dl 1556921754 ref 2 fl Interpret:/0/0 rc 0/0 May 03 15:15:49 fir-md1-s1 kernel: Lustre: 102496:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 11 previous similar messages May 03 15:18:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.10.9@o2ib6) May 03 15:18:28 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages May 03 15:24:27 fir-md1-s1 kernel: Lustre: 101685:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556922260/real 1556922260] req@ffff982a36fa0000 x1632286547157440/t0(0) o106->fir-MDT0002@10.8.11.9@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556922267 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 03 15:24:27 fir-md1-s1 kernel: Lustre: 101685:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 13 previous similar messages May 03 15:24:35 fir-md1-s1 kernel: Lustre: 102524:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9822452c8f00 x1632094329554864/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:10/0 lens 480/568 e 1 to 0 dl 1556922280 ref 2 fl Interpret:/0/0 rc 0/0 May 03 15:24:35 fir-md1-s1 kernel: Lustre: 102524:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 03 15:24:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 03 15:24:41 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages May 03 15:25:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f707532b-fbf3-a25a-213f-6ee216d9e3ee (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983b3ec51400, cur 1556922356 expire 1556922206 last 1556922129 May 03 15:25:56 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages May 03 15:40:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client eac8837a-ac82-cdac-3334-a9d3b43c4206 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9840f9ea6400, cur 1556923231 expire 1556923081 last 1556923004 May 03 15:40:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 15:41:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 03 15:41:06 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages May 03 15:55:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 1ecffc4c-5025-d47d-e2cd-93895258213b (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983ffaf76000, cur 1556924123 expire 1556923973 last 1556923896 May 03 15:55:23 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 15:58:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 03 15:58:23 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 16:06:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client deb77812-f584-b192-ea35-1d7dd75e984c (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983a7ac0a400, cur 1556924809 expire 1556924659 last 1556924582 May 03 16:06:49 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 16:08:32 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 03 16:08:32 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 16:20:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 75a42419-1c36-3d84-69b0-0982bb5ad919 (at 10.9.101.63@o2ib4) reconnecting May 03 16:20:20 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages May 03 16:20:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 586563e7-23da-b7c7-408b-81f7cf74781b (at 10.9.101.63@o2ib4) May 03 16:20:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 16:20:20 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.101.63@o2ib4, removing former export from same NID May 03 16:20:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 03 16:20:27 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 16:28:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 77f69608-fd9a-eaf7-473b-9ae8ad4455d7 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983800f2e400, cur 1556926083 expire 1556925933 last 1556925856 May 03 16:28:03 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 03 16:33:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 03 16:33:45 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 16:39:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 86121a96-c964-720d-4196-e7e8af9a6c1b (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983cff53ec00, cur 1556926781 expire 1556926631 last 1556926554 May 03 16:39:41 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 16:40:16 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 03 16:40:16 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 17:07:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client c5fe7dc5-6be6-753c-1bfb-3059cbdab4cc (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9831bea28800, cur 1556928475 expire 1556928325 last 1556928248 May 03 17:07:55 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 17:08:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 03 17:08:23 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 17:56:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client cbd0bde2-1d67-c3c9-d9e1-2825c9dc4656 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98286cd8b800, cur 1556931367 expire 1556931217 last 1556931140 May 03 17:56:07 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 17:56:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 03 17:56:22 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 18:01:33 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client dc913393-987d-4c1e-4e3b-a516561928d7 (at 10.9.112.13@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982c66e7d400, cur 1556931693 expire 1556931543 last 1556931466 May 03 18:01:33 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 18:01:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client a327e393-246e-f0b0-a4c7-257350ff9a2e (at 10.9.112.13@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985cbc128400, cur 1556931707 expire 1556931557 last 1556931480 May 03 18:01:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a327e393-246e-f0b0-a4c7-257350ff9a2e (at 10.9.112.13@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982cfbd5b800, cur 1556931716 expire 1556931566 last 1556931489 May 03 22:26:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 50eec116-2bbe-0010-8e94-90a3b26eabed (at 10.8.26.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983bd72a5c00, cur 1556947612 expire 1556947462 last 1556947385 May 03 22:27:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 627e76d3-c7eb-8d97-a64e-cb771e022cc0 (at 10.8.26.33@o2ib6) May 03 22:27:37 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 23:28:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 71191044-c535-199c-f761-3a1e66b979bf (at 10.8.9.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9829cde2c800, cur 1556951286 expire 1556951136 last 1556951059 May 03 23:28:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 03 23:28:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.9.8@o2ib6) May 03 23:28:34 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 00:52:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client d48933ec-1431-e44d-6812-3036ccaf11ec (at 10.9.108.70@o2ib4) reconnecting May 04 00:52:15 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 04 00:52:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b02be38d-1861-729c-ba8a-e1ac2e35bb91 (at 10.9.108.70@o2ib4) May 04 00:52:15 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 01:45:27 fir-md1-s1 kernel: Lustre: 102597:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 May 04 05:15:03 fir-md1-s1 kernel: Lustre: 102389:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556972096/real 1556972096] req@ffff984a38e12a00 x1632302153652080/t0(0) o104->fir-MDT0002@10.9.106.54@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1556972103 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 04 05:15:03 fir-md1-s1 kernel: Lustre: 102389:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 24 previous similar messages May 04 05:15:11 fir-md1-s1 kernel: Lustre: 101918:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff983f70a9fb00 x1631533481867968/t0(0) o101->d28dc93e-4982-4ddf-ec8e-785b339e0e10@10.9.115.12@o2ib4:16/0 lens 480/568 e 1 to 0 dl 1556972116 ref 2 fl Interpret:/0/0 rc 0/0 May 04 05:15:11 fir-md1-s1 kernel: Lustre: 101918:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 04 05:15:17 fir-md1-s1 kernel: Lustre: 101686:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff983c30e8f500 x1631533481868528/t0(0) o101->d28dc93e-4982-4ddf-ec8e-785b339e0e10@10.9.115.12@o2ib4:22/0 lens 480/568 e 1 to 0 dl 1556972122 ref 2 fl Interpret:/0/0 rc 0/0 May 04 05:15:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client d28dc93e-4982-4ddf-ec8e-785b339e0e10 (at 10.9.115.12@o2ib4) reconnecting May 04 05:15:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.115.12@o2ib4) May 04 05:15:31 fir-md1-s1 kernel: LustreError: 102389:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.9.106.54@o2ib4) failed to reply to blocking AST (req@ffff984a38e12a00 x1632302153652080 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff9827a2afde80/0xce8853947c1dc168 lrc: 4/0,0 mode: PR/PR res: [0x2c001a787:0x1771:0x0].0x0 bits 0x5b/0x0 rrc: 18 type: IBT flags: 0x60200400000020 nid: 10.9.106.54@o2ib4 remote: 0x1481c1e0951d74b4 expref: 5583 pid: 88046 timeout: 497425 lvb_type: 0 May 04 05:15:31 fir-md1-s1 kernel: LustreError: 102389:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) Skipped 2 previous similar messages May 04 05:15:31 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.9.106.54@o2ib4 was evicted due to a lock blocking callback time out: rc -110 May 04 05:15:31 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages May 04 05:15:31 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 35s: evicting client at 10.9.106.54@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff9827a2afde80/0xce8853947c1dc168 lrc: 3/0,0 mode: PR/PR res: [0x2c001a787:0x1771:0x0].0x0 bits 0x5b/0x0 rrc: 18 type: IBT flags: 0x60200400000020 nid: 10.9.106.54@o2ib4 remote: 0x1481c1e0951d74b4 expref: 5584 pid: 88046 timeout: 0 lvb_type: 0 May 04 05:15:31 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message May 04 05:18:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 383440be-7ca6-4d43-d52b-759fc8a58b5d (at 10.9.106.54@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982cfbd5d800, cur 1556972297 expire 1556972147 last 1556972070 May 04 05:18:17 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 05:20:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.106.54@o2ib4) May 04 06:00:57 fir-md1-s1 kernel: Lustre: 102389:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556974848/real 1556974848] req@ffff984bca349b00 x1632302938129648/t0(0) o104->fir-MDT0002@10.9.106.54@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1556974856 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 04 06:00:57 fir-md1-s1 kernel: Lustre: 102389:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 8 previous similar messages May 04 06:01:06 fir-md1-s1 kernel: Lustre: 103233:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556974858/real 1556974858] req@ffff98557ae1c800 x1632302940612976/t0(0) o104->fir-MDT0002@10.9.106.54@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1556974866 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 04 06:01:06 fir-md1-s1 kernel: Lustre: 103233:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages May 04 06:01:07 fir-md1-s1 kernel: Lustre: 102646:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9820daed2700 x1631546816647344/t0(0) o101->cd5061ed-0afc-d910-1e43-3dc27bc58135@10.8.26.32@o2ib6:12/0 lens 480/568 e 1 to 0 dl 1556974872 ref 2 fl Interpret:/0/0 rc 0/0 May 04 06:01:08 fir-md1-s1 kernel: Lustre: 101913:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff98229fb8ef00 x1631535951658448/t0(0) o101->4fd3697b-8ac3-d03c-d547-c2a2aae5b292@10.8.28.8@o2ib6:13/0 lens 480/568 e 1 to 0 dl 1556974873 ref 2 fl Interpret:/0/0 rc 0/0 May 04 06:01:10 fir-md1-s1 kernel: Lustre: 102614:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff983ced202d00 x1631558992038096/t0(0) o101->f7eea6b3-9a9f-841e-99ed-068aea1c8860@10.9.106.47@o2ib4:15/0 lens 480/568 e 1 to 0 dl 1556974875 ref 2 fl Interpret:/0/0 rc 0/0 May 04 06:01:13 fir-md1-s1 kernel: Lustre: 88046:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff985a2c326f00 x1631544173086096/t0(0) o101->2807f26b-3f0f-6eef-511b-1841519e3d83@10.9.108.22@o2ib4:18/0 lens 480/568 e 1 to 0 dl 1556974878 ref 2 fl Interpret:/0/0 rc 0/0 May 04 06:01:13 fir-md1-s1 kernel: Lustre: 88046:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages May 04 06:01:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client cd5061ed-0afc-d910-1e43-3dc27bc58135 (at 10.8.26.32@o2ib6) reconnecting May 04 06:01:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 65df4257-c311-0104-8637-611580d202b5 (at 10.8.26.32@o2ib6) May 04 06:01:13 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 06:01:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 4fd3697b-8ac3-d03c-d547-c2a2aae5b292 (at 10.8.28.8@o2ib6) reconnecting May 04 06:01:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 0b766838-89ea-3d2e-06ca-f7727d84cf43 (at 10.8.28.8@o2ib6) May 04 06:01:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client f7eea6b3-9a9f-841e-99ed-068aea1c8860 (at 10.9.106.47@o2ib4) reconnecting May 04 06:01:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.106.47@o2ib4) May 04 06:01:17 fir-md1-s1 kernel: Lustre: 102603:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff985bbebb6f00 x1631559066815552/t0(0) o101->a2643c51-ed30-6fc6-ba4f-67e217a258b1@10.9.102.5@o2ib4:22/0 lens 480/568 e 1 to 0 dl 1556974882 ref 2 fl Interpret:/0/0 rc 0/0 May 04 06:01:17 fir-md1-s1 kernel: Lustre: 102603:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 6 previous similar messages May 04 06:01:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 2807f26b-3f0f-6eef-511b-1841519e3d83 (at 10.9.108.22@o2ib4) reconnecting May 04 06:01:19 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages May 04 06:01:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 8d8047e2-8ed7-ed8e-cdc8-a3cc9918a22e (at 10.9.108.22@o2ib4) May 04 06:01:20 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages May 04 06:01:21 fir-md1-s1 kernel: LustreError: 102389:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.9.106.54@o2ib4) failed to reply to blocking AST (req@ffff984bca349b00 x1632302938129648 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff981f57201200/0xce885394a5c9c84c lrc: 4/0,0 mode: PR/PR res: [0x2c001bdcd:0x16a6:0x0].0x0 bits 0x5b/0x0 rrc: 50 type: IBT flags: 0x60200400000020 nid: 10.9.106.54@o2ib4 remote: 0xeead2a965a45dd8c expref: 2115 pid: 102416 timeout: 500173 lvb_type: 0 May 04 06:01:21 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.9.106.54@o2ib4 was evicted due to a lock blocking callback time out: rc -110 May 04 06:01:21 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 33s: evicting client at 10.9.106.54@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff981f57201200/0xce885394a5c9c84c lrc: 3/0,0 mode: PR/PR res: [0x2c001bdcd:0x16a6:0x0].0x0 bits 0x5b/0x0 rrc: 50 type: IBT flags: 0x60200400000020 nid: 10.9.106.54@o2ib4 remote: 0xeead2a965a45dd8c expref: 2116 pid: 102416 timeout: 0 lvb_type: 0 May 04 06:01:21 fir-md1-s1 kernel: Lustre: 102633:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:1s); client may timeout. req@ffff984c6d393c00 x1631558749014688/t0(0) o101->ed4bb535-6b9d-701d-993b-133faa2d1314@10.9.105.25@o2ib4:20/0 lens 480/536 e 1 to 0 dl 1556974880 ref 1 fl Complete:/0/0 rc 0/0 May 04 06:04:19 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 77ce0fe1-1130-1578-0477-e0de5e496c3c (at 10.9.106.54@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985904656c00, cur 1556975059 expire 1556974909 last 1556974832 May 04 06:04:19 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 04 06:04:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 9a943a6e-ba71-6a16-d3e1-22dafa089e55 (at 10.9.106.54@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff981d16302000, cur 1556975066 expire 1556974916 last 1556974839 May 04 06:06:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.106.54@o2ib4) May 04 06:06:13 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages May 04 06:13:11 fir-md1-s1 kernel: Lustre: 102549:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 04 06:13:11 fir-md1-s1 kernel: Lustre: 102549:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message May 04 07:02:07 fir-md1-s1 kernel: Lustre: 102436:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556978520/real 1556978520] req@ffff9853d8a38000 x1632303932619312/t0(0) o104->fir-MDT0002@10.9.106.54@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1556978527 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 04 07:02:07 fir-md1-s1 kernel: Lustre: 102436:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 41 previous similar messages May 04 07:02:10 fir-md1-s1 kernel: Lustre: 102663:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556978523/real 1556978523] req@ffff9821c450bc00 x1632303933552192/t0(0) o104->fir-MDT0000@10.9.106.54@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1556978530 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 04 07:02:10 fir-md1-s1 kernel: Lustre: 102663:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages May 04 07:02:15 fir-md1-s1 kernel: Lustre: 102549:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982a9276e900 x1632097232806272/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:20/0 lens 480/568 e 1 to 0 dl 1556978540 ref 2 fl Interpret:/0/0 rc 0/0 May 04 07:02:15 fir-md1-s1 kernel: Lustre: 102512:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556978528/real 1556978528] req@ffff982e0b2aad00 x1632303933078352/t0(0) o104->fir-MDT0002@10.9.106.54@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1556978535 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 04 07:02:15 fir-md1-s1 kernel: Lustre: 102512:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 7 previous similar messages May 04 07:02:16 fir-md1-s1 kernel: Lustre: 102544:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9830eff70f00 x1631536251854640/t0(0) o101->c9ce12e9-3cda-482a-6a30-1bff01061762@10.8.8.36@o2ib6:21/0 lens 480/568 e 1 to 0 dl 1556978541 ref 2 fl Interpret:/0/0 rc 0/0 May 04 07:02:16 fir-md1-s1 kernel: Lustre: 102544:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages May 04 07:02:20 fir-md1-s1 kernel: Lustre: 102401:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff983f72616c00 x1631559810109792/t0(0) o101->6818c063-a70b-0d7a-5ae5-0dc447ff5658@10.9.105.14@o2ib4:25/0 lens 480/568 e 1 to 0 dl 1556978545 ref 2 fl Interpret:/0/0 rc 0/0 May 04 07:02:20 fir-md1-s1 kernel: Lustre: 102401:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages May 04 07:02:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client f7eea6b3-9a9f-841e-99ed-068aea1c8860 (at 10.9.106.47@o2ib4) reconnecting May 04 07:02:21 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages May 04 07:02:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.106.47@o2ib4) May 04 07:02:21 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 07:02:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client c9ce12e9-3cda-482a-6a30-1bff01061762 (at 10.8.8.36@o2ib6) reconnecting May 04 07:02:22 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 04 07:02:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.8.8.36@o2ib6) May 04 07:02:22 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 04 07:02:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 4dfcb63c-6052-245d-ccff-2613be179683 (at 10.9.108.58@o2ib4) reconnecting May 04 07:02:25 fir-md1-s1 kernel: Lustre: 102617:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff984ca1323c00 x1631546817938224/t0(0) o101->cd5061ed-0afc-d910-1e43-3dc27bc58135@10.8.26.32@o2ib6:29/0 lens 480/568 e 1 to 0 dl 1556978549 ref 2 fl Interpret:/0/0 rc 0/0 May 04 07:02:25 fir-md1-s1 kernel: Lustre: 102617:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 04 07:02:25 fir-md1-s1 kernel: Lustre: 103196:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556978538/real 1556978538] req@ffff9853d8a39500 x1632303937864560/t0(0) o104->fir-MDT0002@10.9.106.54@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1556978545 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 04 07:02:25 fir-md1-s1 kernel: Lustre: 103196:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 23 previous similar messages May 04 07:02:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.105.14@o2ib4) May 04 07:02:26 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 07:02:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 74fb56c5-8bc6-38a9-8624-788945b7232f (at 10.9.115.2@o2ib4) reconnecting May 04 07:02:27 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 07:02:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 65df4257-c311-0104-8637-611580d202b5 (at 10.8.26.32@o2ib6) May 04 07:02:30 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 04 07:02:33 fir-md1-s1 kernel: Lustre: 102353:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9839926bd700 x1631607762319936/t0(0) o101->cd6a890f-3ae3-4002-9a6f-a0a5b59c9ffb@10.8.7.9@o2ib6:7/0 lens 480/568 e 1 to 0 dl 1556978557 ref 2 fl Interpret:/0/0 rc 0/0 May 04 07:02:33 fir-md1-s1 kernel: Lustre: 102353:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages May 04 07:02:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 75a42419-1c36-3d84-69b0-0982bb5ad919 (at 10.9.101.63@o2ib4) reconnecting May 04 07:02:34 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 07:02:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.108.17@o2ib4) May 04 07:02:39 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages May 04 07:02:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client c9ce12e9-3cda-482a-6a30-1bff01061762 (at 10.8.8.36@o2ib6) reconnecting May 04 07:02:43 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages May 04 07:02:44 fir-md1-s1 kernel: Lustre: 102578:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556978557/real 1556978557] req@ffff983c2262d100 x1632303935291360/t0(0) o104->fir-MDT0002@10.9.106.54@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1556978564 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 04 07:02:44 fir-md1-s1 kernel: Lustre: 102578:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 85 previous similar messages May 04 07:02:49 fir-md1-s1 kernel: Lustre: 101920:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982241fc0f00 x1631561382344896/t0(0) o101->0ec68aab-db80-1651-1da4-04c983cca199@10.9.106.57@o2ib4:24/0 lens 480/568 e 1 to 0 dl 1556978574 ref 2 fl Interpret:/0/0 rc 0/0 May 04 07:02:49 fir-md1-s1 kernel: Lustre: 101920:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 10 previous similar messages May 04 07:02:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.106.57@o2ib4) May 04 07:02:55 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages May 04 07:03:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 8ef628ba-0f98-b525-586c-934df1f86cd6 (at 10.9.106.39@o2ib4) reconnecting May 04 07:03:00 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages May 04 07:03:21 fir-md1-s1 kernel: Lustre: 102659:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff9828f4bc7500 x1631560212054320/t0(0) o101->dca21337-fdba-5128-347e-592b37646902@10.9.108.60@o2ib4:26/0 lens 480/568 e 0 to 0 dl 1556978606 ref 2 fl Interpret:/0/0 rc 0/0 May 04 07:03:21 fir-md1-s1 kernel: Lustre: 102659:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 38 previous similar messages May 04 07:03:21 fir-md1-s1 kernel: Lustre: 102452:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556978594/real 1556978594] req@ffff985590704500 x1632303939971776/t0(0) o104->fir-MDT0002@10.9.106.54@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1556978601 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 04 07:03:21 fir-md1-s1 kernel: Lustre: 102452:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 360 previous similar messages May 04 07:03:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.115.12@o2ib4) May 04 07:03:27 fir-md1-s1 kernel: Lustre: Skipped 67 previous similar messages May 04 07:03:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ca15d879-1cb2-8780-e5e2-20230d9e27cf (at 10.8.28.3@o2ib6) reconnecting May 04 07:03:33 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages May 04 07:04:26 fir-md1-s1 kernel: Lustre: 102553:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-16), not sending early reply req@ffff982dd3797200 x1631559450905872/t0(0) o101->ef5fd4bc-3ade-f022-c480-f42cc4ae70e5@10.9.105.22@o2ib4:1/0 lens 480/568 e 0 to 0 dl 1556978671 ref 2 fl Interpret:/0/0 rc 0/0 May 04 07:04:26 fir-md1-s1 kernel: Lustre: 102553:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 55 previous similar messages May 04 07:04:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.107.42@o2ib4) May 04 07:04:32 fir-md1-s1 kernel: Lustre: Skipped 210 previous similar messages May 04 07:04:34 fir-md1-s1 kernel: LustreError: 102436:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.9.106.54@o2ib4) failed to reply to blocking AST (req@ffff9853d8a38000 x1632303932619312 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff984c64386780/0xce885394e31afb1b lrc: 4/0,0 mode: PR/PR res: [0x2c001bd52:0xd3a:0x0].0x0 bits 0x40/0x0 rrc: 46 type: IBT flags: 0x60000400000020 nid: 10.9.106.54@o2ib4 remote: 0xa4b8c42baf698247 expref: 2375 pid: 102644 timeout: 504087 lvb_type: 0 May 04 07:04:34 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.9.106.54@o2ib4 was evicted due to a lock blocking callback time out: rc -110 May 04 07:04:34 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.9.106.54@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff984c64386780/0xce885394e31afb1b lrc: 3/0,0 mode: PR/PR res: [0x2c001bd52:0xd3a:0x0].0x0 bits 0x40/0x0 rrc: 46 type: IBT flags: 0x60000400000020 nid: 10.9.106.54@o2ib4 remote: 0xa4b8c42baf698247 expref: 2376 pid: 102644 timeout: 0 lvb_type: 0 May 04 07:04:34 fir-md1-s1 kernel: Lustre: 103223:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (92:1s); client may timeout. req@ffff9858b7262100 x1631558741053312/t0(0) o101->312870a0-b3a0-7d74-c205-f85bea84ee0b@10.9.105.20@o2ib4:1/0 lens 480/536 e 0 to 0 dl 1556978673 ref 1 fl Complete:/0/0 rc 0/0 May 04 07:04:34 fir-md1-s1 kernel: Lustre: 103223:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2 previous similar messages May 04 07:04:37 fir-md1-s1 kernel: Lustre: 102663:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556978670/real 1556978670] req@ffff9821c450bc00 x1632303933552192/t0(0) o104->fir-MDT0000@10.9.106.54@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1556978677 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 04 07:04:37 fir-md1-s1 kernel: Lustre: 102663:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1248 previous similar messages May 04 07:04:37 fir-md1-s1 kernel: LustreError: 102663:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.9.106.54@o2ib4) failed to reply to blocking AST (req@ffff9821c450bc00 x1632303933552192 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff985cd6541200/0xce885394e3c17e43 lrc: 4/0,0 mode: PW/PW res: [0x20002182f:0xa5:0x0].0x0 bits 0x40/0x0 rrc: 10 type: IBT flags: 0x60200400000020 nid: 10.9.106.54@o2ib4 remote: 0xa4b8c42baf6ab138 expref: 147 pid: 102643 timeout: 504091 lvb_type: 0 May 04 07:04:37 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.9.106.54@o2ib4 was evicted due to a lock blocking callback time out: rc -110 May 04 07:04:37 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.9.106.54@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff985cd6541200/0xce885394e3c17e43 lrc: 3/0,0 mode: PW/PW res: [0x20002182f:0xa5:0x0].0x0 bits 0x40/0x0 rrc: 10 type: IBT flags: 0x60200400000020 nid: 10.9.106.54@o2ib4 remote: 0xa4b8c42baf6ab138 expref: 148 pid: 102643 timeout: 0 lvb_type: 0 May 04 07:05:40 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client bb56db4a-7c63-607b-b207-ef975fd0a2da (at 10.9.106.54@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984cc7d5b800, cur 1556978740 expire 1556978590 last 1556978513 May 04 07:07:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.106.54@o2ib4) May 04 07:07:26 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages May 04 07:24:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.13.24@o2ib6) May 04 07:24:37 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 07:25:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 88c93a95-f5ad-6110-cc93-c62767675453 (at 10.8.13.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983b6c716800, cur 1556979940 expire 1556979790 last 1556979713 May 04 07:54:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 4d1ae96a-bfd7-a4f6-e4dd-c47a6ccecff7 (at 10.9.106.54@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98497a3d2c00, cur 1556981652 expire 1556981502 last 1556981425 May 04 07:54:12 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 07:54:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 4d1ae96a-bfd7-a4f6-e4dd-c47a6ccecff7 (at 10.9.106.54@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982a8ce77c00, cur 1556981659 expire 1556981509 last 1556981432 May 04 07:56:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.106.54@o2ib4) May 04 07:56:15 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 13:31:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 04 13:31:13 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 13:31:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.8.27.23@o2ib6) May 04 13:31:14 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 04 13:32:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 767b9887-4949-9409-f873-995c28218891 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9836863a5800, cur 1557001933 expire 1557001783 last 1557001706 May 04 13:32:13 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 04 13:35:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 04 13:35:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a600d77e-56e0-8ca3-ef8c-c1da33b325f1 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984cb870b800, cur 1557002154 expire 1557002004 last 1557001927 May 04 13:35:54 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 13:39:56 fir-md1-s1 kernel: Lustre: 102763:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557002389/real 1557002389] req@ffff9838b96ed700 x1632310956739376/t0(0) o104->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557002396 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 04 13:40:04 fir-md1-s1 kernel: Lustre: 102353:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff983b793d8f00 x1631779537806896/t0(0) o101->b1ac7951-67b3-5d05-244d-b23c643bc210@10.8.24.34@o2ib6:9/0 lens 1784/3288 e 1 to 0 dl 1557002409 ref 2 fl Interpret:/0/0 rc 0/0 May 04 13:40:04 fir-md1-s1 kernel: Lustre: 102353:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 10 previous similar messages May 04 13:40:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client b1ac7951-67b3-5d05-244d-b23c643bc210 (at 10.8.24.34@o2ib6) reconnecting May 04 13:40:10 fir-md1-s1 kernel: Lustre: Skipped 208 previous similar messages May 04 13:40:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.8.24.34@o2ib6) May 04 13:40:10 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 13:40:17 fir-md1-s1 kernel: Lustre: 102763:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557002410/real 1557002410] req@ffff9838b96ed700 x1632310956739376/t0(0) o104->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557002417 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 04 13:40:17 fir-md1-s1 kernel: Lustre: 102763:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 04 13:40:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client b1ac7951-67b3-5d05-244d-b23c643bc210 (at 10.8.24.34@o2ib6) reconnecting May 04 13:40:31 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages May 04 13:40:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.8.24.34@o2ib6) May 04 13:40:31 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages May 04 13:40:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client b1ac7951-67b3-5d05-244d-b23c643bc210 (at 10.8.24.34@o2ib6) reconnecting May 04 13:40:52 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages May 04 13:40:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.8.24.34@o2ib6) May 04 13:40:52 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages May 04 13:40:59 fir-md1-s1 kernel: Lustre: 102763:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557002452/real 1557002452] req@ffff9838b96ed700 x1632310956739376/t0(0) o104->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557002459 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 04 13:40:59 fir-md1-s1 kernel: Lustre: 102763:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 11 previous similar messages May 04 13:41:10 fir-md1-s1 kernel: Lustre: 102610:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff982145b66600 x1631849532761264/t0(0) o101->2943d7c9-ecf1-ed5a-9d88-3a1d89520529@10.8.31.9@o2ib6:15/0 lens 584/3264 e 0 to 0 dl 1557002475 ref 2 fl Interpret:/0/0 rc 0/0 May 04 13:41:10 fir-md1-s1 kernel: Lustre: 102610:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 8 previous similar messages May 04 13:41:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.8.24.34@o2ib6) May 04 13:41:13 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages May 04 13:41:19 fir-md1-s1 kernel: LustreError: 102685:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557002389, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff984c62aecc80/0xce885396328dfad7 lrc: 3/1,0 mode: --/PR res: [0x2000216f5:0x52ac:0x0].0x0 bits 0x13/0x0 rrc: 10 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 102685 timeout: 0 lvb_type: 0 May 04 13:41:19 fir-md1-s1 kernel: LustreError: 102685:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 3 previous similar messages May 04 13:41:20 fir-md1-s1 kernel: LustreError: 101913:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557002390, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff982213bb0000/0xce885396329371db lrc: 3/1,0 mode: --/PR res: [0x2000216f5:0x50cf:0x0].0x0 bits 0x13/0x0 rrc: 12 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 101913 timeout: 0 lvb_type: 0 May 04 13:41:20 fir-md1-s1 kernel: LustreError: 101913:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 5 previous similar messages May 04 13:41:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client b1ac7951-67b3-5d05-244d-b23c643bc210 (at 10.8.24.34@o2ib6) reconnecting May 04 13:41:34 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages May 04 13:41:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.8.31.9@o2ib6) May 04 13:41:48 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages May 04 13:42:15 fir-md1-s1 kernel: LustreError: 102754:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557002445, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff9828287eb180/0xce8853963391a5cb lrc: 3/1,0 mode: --/PR res: [0x2000216f5:0x50cf:0x0].0x0 bits 0x13/0x0 rrc: 12 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 102754 timeout: 0 lvb_type: 0 May 04 13:42:16 fir-md1-s1 kernel: Lustre: 102510:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557002529/real 1557002529] req@ffff983c0aa95a00 x1632310994632576/t0(0) o104->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557002536 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 04 13:42:16 fir-md1-s1 kernel: Lustre: 102510:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 22 previous similar messages May 04 13:42:23 fir-md1-s1 kernel: LustreError: 102763:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.27.23@o2ib6) failed to reply to blocking AST (req@ffff9838b96ed700 x1632310956739376 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff983cf4b3a400/0xce885396326571d5 lrc: 4/0,0 mode: PR/PR res: [0x2000216f5:0x52ac:0x0].0x0 bits 0x13/0x0 rrc: 10 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0x295bf93f188360a3 expref: 394 pid: 101902 timeout: 527957 lvb_type: 0 May 04 13:42:23 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.8.27.23@o2ib6 was evicted due to a lock blocking callback time out: rc -110 May 04 13:42:23 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.27.23@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff983872229b00/0xce8853963265a4b8 lrc: 3/0,0 mode: PR/PR res: [0x2000216f5:0x50cf:0x0].0x0 bits 0x13/0x0 rrc: 12 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0x295bf93f188360b8 expref: 395 pid: 102496 timeout: 0 lvb_type: 0 May 04 13:42:23 fir-md1-s1 kernel: LustreError: 102763:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) Skipped 1 previous similar message May 04 13:43:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f553bf50-c028-a97e-9410-fa196fae757f (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9823e1f4a000, cur 1557002601 expire 1557002451 last 1557002374 May 04 13:43:21 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 13:43:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 04 13:43:45 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages May 04 13:48:01 fir-md1-s1 kernel: Lustre: 102480:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557002874/real 1557002874] req@ffff9846dc67c800 x1632311095319440/t0(0) o104->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557002881 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 04 13:48:01 fir-md1-s1 kernel: Lustre: 102480:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages May 04 13:48:19 fir-md1-s1 kernel: Lustre: 102609:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff984428a62a00 x1631335478313536/t0(0) o101->06d24d01-86fd-11a8-6dcf-d16043d84c98@10.8.25.8@o2ib6:24/0 lens 1784/3288 e 0 to 0 dl 1557002904 ref 2 fl Interpret:/0/0 rc 0/0 May 04 13:48:19 fir-md1-s1 kernel: Lustre: 102609:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 04 13:48:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 06d24d01-86fd-11a8-6dcf-d16043d84c98 (at 10.8.25.8@o2ib6) reconnecting May 04 13:48:25 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages May 04 13:48:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.8.25.8@o2ib6) May 04 13:48:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 13:49:27 fir-md1-s1 kernel: LustreError: 102450:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557002877, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff9848ddbf60c0/0xce8853963ac3da73 lrc: 3/1,0 mode: --/PR res: [0x2000216f5:0x5287:0x0].0x0 bits 0x13/0x0 rrc: 12 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 102450 timeout: 0 lvb_type: 0 May 04 13:49:27 fir-md1-s1 kernel: LustreError: 102450:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message May 04 13:49:51 fir-md1-s1 kernel: Lustre: 102478:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff9826887edd00 x1631741965901360/t0(0) o101->2148b651-1ee6-12b9-c46c-72caa706afa6@10.8.12.20@o2ib6:26/0 lens 600/3264 e 0 to 0 dl 1557002996 ref 2 fl Interpret:/0/0 rc 0/0 May 04 13:49:51 fir-md1-s1 kernel: Lustre: 102478:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages May 04 13:50:28 fir-md1-s1 kernel: LustreError: 102480:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.27.23@o2ib6) failed to reply to blocking AST (req@ffff9846dc67c800 x1632311095319440 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff983744bac380/0xce885396390a42fe lrc: 4/0,0 mode: PR/PR res: [0x2000216f5:0x5287:0x0].0x0 bits 0x13/0x0 rrc: 12 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0xacc750d7a07ee910 expref: 405 pid: 102673 timeout: 528442 lvb_type: 0 May 04 13:50:28 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.8.27.23@o2ib6 was evicted due to a lock blocking callback time out: rc -110 May 04 13:50:28 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message May 04 13:50:28 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.27.23@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff983744bac380/0xce885396390a42fe lrc: 3/0,0 mode: PR/PR res: [0x2000216f5:0x5287:0x0].0x0 bits 0x13/0x0 rrc: 12 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0xacc750d7a07ee910 expref: 406 pid: 102673 timeout: 0 lvb_type: 0 May 04 13:51:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 62239d4a-6992-768d-a1cc-2d98661c092d (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98389fa7c400, cur 1557003081 expire 1557002931 last 1557002854 May 04 13:51:21 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 04 13:55:34 fir-md1-s1 kernel: Lustre: 102644:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557003326/real 1557003326] req@ffff984696f25a00 x1632311225769760/t0(0) o104->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557003333 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 04 13:55:34 fir-md1-s1 kernel: Lustre: 102644:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 21 previous similar messages May 04 13:55:51 fir-md1-s1 kernel: Lustre: 102492:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff983f71a2da00 x1631676278915184/t0(0) o36->fdc52cb1-2cc9-0d98-0f2b-dc082fc53acc@10.8.8.37@o2ib6:26/0 lens 512/448 e 0 to 0 dl 1557003356 ref 2 fl Interpret:/0/0 rc 0/0 May 04 13:55:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client fdc52cb1-2cc9-0d98-0f2b-dc082fc53acc (at 10.8.8.37@o2ib6) reconnecting May 04 13:55:58 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages May 04 13:55:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.8.8.37@o2ib6) May 04 13:55:58 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages May 04 13:56:59 fir-md1-s1 kernel: LustreError: 102401:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557003329, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff982b31742880/0xce88539642422c9e lrc: 3/1,0 mode: --/PR res: [0x2000216f5:0x17fb:0x0].0x0 bits 0x13/0x0 rrc: 88 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 102401 timeout: 0 lvb_type: 0 May 04 13:56:59 fir-md1-s1 kernel: LustreError: 102401:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message May 04 13:57:07 fir-md1-s1 kernel: LustreError: 102403:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557003337, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff983b72fbfbc0/0xce8853964265bf2e lrc: 3/1,0 mode: --/PR res: [0x2000216f5:0x17fb:0x0].0x0 bits 0x13/0x0 rrc: 93 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 102403 timeout: 0 lvb_type: 0 May 04 13:57:07 fir-md1-s1 kernel: LustreError: 102403:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 25 previous similar messages May 04 13:57:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 7f2bb20f-4f02-afda-1d89-8b69b6f31ac2 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983cdd15f400, cur 1557003436 expire 1557003286 last 1557003209 May 04 13:57:16 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 04 13:57:22 fir-md1-s1 kernel: LustreError: 102663:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.27.23@o2ib6) returned error from glimpse AST (req@ffff981e1759f500 x1632311255952640 status -107 rc -107), evict it ns: mdt-fir-MDT0000_UUID lock: ffff9823a6bfd340/0xce8853963f334776 lrc: 4/0,0 mode: PW/PW res: [0x200022077:0x2:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x40200000000000 nid: 10.8.27.23@o2ib6 remote: 0xb95c601a9d11b928 expref: 419 pid: 102394 timeout: 0 lvb_type: 0 May 04 13:57:22 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.8.27.23@o2ib6 was evicted due to a lock blocking callback time out: rc -107 May 04 13:57:22 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 116s: evicting client at 10.8.27.23@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff983c8677af40/0xce8853963f1c3d32 lrc: 3/0,0 mode: PR/PR res: [0x2000216f5:0x17fb:0x0].0x0 bits 0x13/0x0 rrc: 99 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0xb95c601a9d10ddc0 expref: 420 pid: 102586 timeout: 0 lvb_type: 0 May 04 13:57:22 fir-md1-s1 kernel: Lustre: 102578:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (92:1s); client may timeout. req@ffff98378628e600 x1631560446298960/t0(0) o101->9b7917ef-4055-daa1-69c4-53b2ed51bc97@10.8.7.27@o2ib6:19/0 lens 576/624 e 0 to 0 dl 1557003441 ref 1 fl Complete:/0/0 rc 0/0 May 04 13:57:22 fir-md1-s1 kernel: LustreError: 102663:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) Skipped 3 previous similar messages May 04 14:02:40 fir-md1-s1 kernel: Lustre: 102379:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff982f08e1c500 x1631835098112352/t0(0) o101->b4a7dae6-d345-8e9f-e334-9f38da541ca7@10.8.24.33@o2ib6:15/0 lens 1784/3288 e 0 to 0 dl 1557003765 ref 2 fl Interpret:/0/0 rc 0/0 May 04 14:02:40 fir-md1-s1 kernel: Lustre: 102379:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 87 previous similar messages May 04 14:02:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client b4a7dae6-d345-8e9f-e334-9f38da541ca7 (at 10.8.24.33@o2ib6) reconnecting May 04 14:02:46 fir-md1-s1 kernel: Lustre: Skipped 181 previous similar messages May 04 14:03:50 fir-md1-s1 kernel: LustreError: 102586:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557003739, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff98226e073a80/0xce88539648d87e1c lrc: 3/1,0 mode: --/PR res: [0x2000216f5:0x5857:0x0].0x0 bits 0x13/0x0 rrc: 13 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 102586 timeout: 0 lvb_type: 0 May 04 14:03:50 fir-md1-s1 kernel: LustreError: 102586:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 17 previous similar messages May 04 14:04:50 fir-md1-s1 kernel: LustreError: 102425:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557003800, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff98280eaf8d80/0xce88539649c31ce6 lrc: 3/1,0 mode: --/PR res: [0x2000216f5:0x583f:0x0].0x0 bits 0x13/0x0 rrc: 13 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 102425 timeout: 0 lvb_type: 0 May 04 14:04:50 fir-md1-s1 kernel: LustreError: 102425:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message May 04 14:04:50 fir-md1-s1 kernel: LustreError: 102454:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.27.23@o2ib6) failed to reply to blocking AST (req@ffff983a29602100 x1632311328885104 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff9837e7348240/0xce88539648a891bc lrc: 4/0,0 mode: PR/PR res: [0x2000216f5:0x5857:0x0].0x0 bits 0x13/0x0 rrc: 13 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0x7e98dabd36ccc785 expref: 336 pid: 102379 timeout: 529280 lvb_type: 0 May 04 14:04:50 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.8.27.23@o2ib6 was evicted due to a lock blocking callback time out: rc -110 May 04 14:04:50 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages May 04 14:04:50 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 155s: evicting client at 10.8.27.23@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff9837e7348240/0xce88539648a891bc lrc: 3/0,0 mode: PR/PR res: [0x2000216f5:0x5857:0x0].0x0 bits 0x13/0x0 rrc: 13 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0x7e98dabd36ccc785 expref: 337 pid: 102379 timeout: 0 lvb_type: 0 May 04 14:04:50 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message May 04 14:04:50 fir-md1-s1 kernel: Lustre: 102454:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (154:1s); client may timeout. req@ffff982f08e1c500 x1631835098112352/t309956562259(0) o101->b4a7dae6-d345-8e9f-e334-9f38da541ca7@10.8.24.33@o2ib6:15/0 lens 1784/1296 e 0 to 0 dl 1557003889 ref 1 fl Complete:/0/0 rc 0/0 May 04 14:04:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 04 14:04:57 fir-md1-s1 kernel: Lustre: Skipped 200 previous similar messages May 04 14:05:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 7dedc3aa-cf85-daf7-9fd3-ada2338e6ac1 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98378968cc00, cur 1557003948 expire 1557003798 last 1557003721 May 04 14:05:48 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 04 14:10:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 72196ffb-9a76-4842-d2d7-d8802bd689be (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9848a8f43400, cur 1557004203 expire 1557004053 last 1557003976 May 04 14:10:03 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 04 14:17:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d53c58de-f8cb-7814-75c1-fc52447be8d8 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9822a8204000, cur 1557004669 expire 1557004519 last 1557004442 May 04 14:17:49 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 14:18:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 04 14:18:01 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 04 16:02:14 fir-md1-s1 kernel: Lustre: 102654:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557010927/real 1557010927] req@ffff98320b390000 x1632313344883584/t0(0) o104->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557010934 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 04 16:02:14 fir-md1-s1 kernel: Lustre: 102654:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 36 previous similar messages May 04 16:02:22 fir-md1-s1 kernel: Lustre: 102736:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff98367f278600 x1631547019158272/t0(0) o36->107f67c0-fd04-e8c4-a5b4-517b692522f2@10.8.7.2@o2ib6:27/0 lens 488/2888 e 1 to 0 dl 1557010947 ref 2 fl Interpret:/0/0 rc 0/0 May 04 16:02:22 fir-md1-s1 kernel: Lustre: 102736:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 5 previous similar messages May 04 16:02:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 107f67c0-fd04-e8c4-a5b4-517b692522f2 (at 10.8.7.2@o2ib6) reconnecting May 04 16:02:28 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages May 04 16:02:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.8.7.2@o2ib6) May 04 16:02:28 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 16:02:42 fir-md1-s1 kernel: LustreError: 102654:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.27.23@o2ib6) failed to reply to blocking AST (req@ffff98320b390000 x1632313344883584 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff9831e8707980/0xce8853968f5516d7 lrc: 4/0,0 mode: PR/PR res: [0x2c001a69e:0xd54:0x0].0x0 bits 0x5b/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0x7ce868bc0c993697 expref: 3807 pid: 102505 timeout: 536256 lvb_type: 0 May 04 16:02:42 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.27.23@o2ib6 was evicted due to a lock blocking callback time out: rc -110 May 04 16:02:42 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 35s: evicting client at 10.8.27.23@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff9831e8707980/0xce8853968f5516d7 lrc: 3/0,0 mode: PR/PR res: [0x2c001a69e:0xd54:0x0].0x0 bits 0x5b/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0x7ce868bc0c993697 expref: 3808 pid: 102505 timeout: 0 lvb_type: 0 May 04 16:05:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 46347b9e-ebe9-507e-4869-8b66fcad2413 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982ec5e8d000, cur 1557011132 expire 1557010982 last 1557010905 May 04 16:05:32 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 16:09:39 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 04 16:09:39 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 04 16:33:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 7444eeb8-aa34-ee96-aeea-02f153ed19ac (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983287ea5400, cur 1557012784 expire 1557012634 last 1557012557 May 04 16:33:04 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 04 16:33:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 04 16:33:28 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 16:37:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 04 16:37:13 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 16:37:15 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client d2c8fb66-5325-6715-1869-960b22666d56 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98538ff07000, cur 1557013035 expire 1557012885 last 1557012808 May 04 16:37:15 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 16:37:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 577d49cc-46ec-97b8-1139-c4b0a8b80cef (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983228f11400, cur 1557013039 expire 1557012889 last 1557012812 May 04 16:46:21 fir-md1-s1 kernel: Lustre: 102731:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557013574/real 1557013574] req@ffff9828e4a4e000 x1632314095498224/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557013581 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 04 16:46:21 fir-md1-s1 kernel: Lustre: 102731:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages May 04 16:46:29 fir-md1-s1 kernel: Lustre: 102593:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9823c8a60000 x1632098276508528/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:4/0 lens 480/568 e 1 to 0 dl 1557013594 ref 2 fl Interpret:/0/0 rc 0/0 May 04 16:46:29 fir-md1-s1 kernel: Lustre: 102593:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 04 16:46:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 04 16:46:35 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 04 16:46:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 04 16:46:35 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 16:46:35 fir-md1-s1 kernel: Lustre: 102483:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557013588/real 1557013588] req@ffff982526f80000 x1632314095502032/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557013595 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 04 16:46:35 fir-md1-s1 kernel: Lustre: 102483:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages May 04 16:46:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 04 16:46:56 fir-md1-s1 kernel: Lustre: 102731:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557013609/real 1557013609] req@ffff9828e4a4e000 x1632314095498224/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557013616 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 04 16:46:56 fir-md1-s1 kernel: Lustre: 102731:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages May 04 16:47:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 04 16:47:38 fir-md1-s1 kernel: Lustre: 102483:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557013651/real 1557013651] req@ffff982526f80000 x1632314095502032/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557013658 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 04 16:47:38 fir-md1-s1 kernel: Lustre: 102483:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 10 previous similar messages May 04 16:47:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 04 16:47:59 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 04 16:47:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 04 16:47:59 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages May 04 16:48:55 fir-md1-s1 kernel: Lustre: 102731:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557013728/real 1557013728] req@ffff9828e4a4e000 x1632314095498224/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557013735 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 04 16:48:55 fir-md1-s1 kernel: Lustre: 102731:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 20 previous similar messages May 04 16:49:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 04 16:49:23 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages May 04 16:49:35 fir-md1-s1 kernel: LNet: Service thread pid 102483 was inactive for 200.24s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: May 04 16:49:35 fir-md1-s1 kernel: Pid: 102483, comm: mdt00_042 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 May 04 16:49:35 fir-md1-s1 kernel: Call Trace: May 04 16:49:35 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x500/0x8d0 [ptlrpc] May 04 16:49:35 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] May 04 16:49:35 fir-md1-s1 kernel: [] ldlm_glimpse_locks+0x3b/0x100 [ptlrpc] May 04 16:49:35 fir-md1-s1 kernel: [] mdt_do_glimpse+0x1e9/0x4c0 [mdt] May 04 16:49:35 fir-md1-s1 kernel: [] mdt_glimpse_enqueue+0x3d3/0x4f0 [mdt] May 04 16:49:35 fir-md1-s1 kernel: [] mdt_intent_glimpse+0x1f/0x30 [mdt] May 04 16:49:35 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] May 04 16:49:35 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] May 04 16:49:35 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] May 04 16:49:35 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] May 04 16:49:35 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] May 04 16:49:35 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 04 16:49:35 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] May 04 16:49:35 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 May 04 16:49:35 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 May 04 16:49:35 fir-md1-s1 kernel: [] 0xffffffffffffffff May 04 16:49:35 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1557013775.102483 May 04 16:49:35 fir-md1-s1 kernel: Pid: 102731, comm: mdt00_103 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 May 04 16:49:35 fir-md1-s1 kernel: Call Trace: May 04 16:49:35 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x500/0x8d0 [ptlrpc] May 04 16:49:35 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] May 04 16:49:35 fir-md1-s1 kernel: [] ldlm_glimpse_locks+0x3b/0x100 [ptlrpc] May 04 16:49:35 fir-md1-s1 kernel: [] mdt_do_glimpse+0x1e9/0x4c0 [mdt] May 04 16:49:35 fir-md1-s1 kernel: [] mdt_glimpse_enqueue+0x3d3/0x4f0 [mdt] May 04 16:49:35 fir-md1-s1 kernel: [] mdt_intent_glimpse+0x1f/0x30 [mdt] May 04 16:49:35 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] May 04 16:49:35 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] May 04 16:49:35 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] May 04 16:49:35 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] May 04 16:49:35 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] May 04 16:49:35 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 04 16:49:35 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] May 04 16:49:35 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 May 04 16:49:35 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 May 04 16:49:35 fir-md1-s1 kernel: [] 0xffffffffffffffff May 04 16:49:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f7224363-eb38-b009-9e90-8b8b47f8518e (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982eb0a3ac00, cur 1557013789 expire 1557013639 last 1557013562 May 04 16:49:49 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 04 16:49:49 fir-md1-s1 kernel: LNet: Service thread pid 102731 completed after 214.23s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). May 04 16:49:49 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message May 04 16:59:26 fir-md1-s1 kernel: LustreError: 102700:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.27.23@o2ib6) returned error from glimpse AST (req@ffff982131a9c200 x1632314317171904 status -107 rc -107), evict it ns: mdt-fir-MDT0002_UUID lock: ffff983cd6809b00/0xce88539703928da6 lrc: 4/0,0 mode: PW/PW res: [0x2c0024012:0x3:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x40200000000000 nid: 10.8.27.23@o2ib6 remote: 0x79c1b857237306b8 expref: 39 pid: 102353 timeout: 0 lvb_type: 0 May 04 16:59:26 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.27.23@o2ib6 was evicted due to a lock glimpse callback time out: rc -107 May 04 16:59:26 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 301s: evicting client at 10.8.27.23@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff983cd6809b00/0xce88539703928da6 lrc: 4/0,0 mode: PW/PW res: [0x2c0024012:0x3:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x40200000000000 nid: 10.8.27.23@o2ib6 remote: 0x79c1b857237306b8 expref: 40 pid: 102353 timeout: 0 lvb_type: 0 May 04 16:59:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client dde38836-d5bf-8e01-0932-9c2ac1c28821 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983ab857a400, cur 1557014382 expire 1557014232 last 1557014155 May 04 16:59:42 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 17:00:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 04 17:00:17 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages May 04 17:15:29 fir-md1-s1 kernel: LNetError: 101321:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 04 17:25:33 fir-md1-s1 kernel: Lustre: 102570:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557015926/real 1557015926] req@ffff98210fab8600 x1632314752850000/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557015933 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 04 17:25:33 fir-md1-s1 kernel: Lustre: 102570:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 15 previous similar messages May 04 17:25:41 fir-md1-s1 kernel: Lustre: 102663:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982ab53aa700 x1632098361791744/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:16/0 lens 480/568 e 1 to 0 dl 1557015946 ref 2 fl Interpret:/0/0 rc 0/0 May 04 17:25:41 fir-md1-s1 kernel: Lustre: 102663:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 04 17:25:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 04 17:25:47 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 04 17:25:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 04 17:25:47 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 17:25:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 93205d60-586e-3744-ce33-b1b459643386 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982961e79400, cur 1557015950 expire 1557015800 last 1557015723 May 04 17:25:50 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 04 17:33:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 511a286a-7555-3179-23d0-2620081196ae (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98354ea3f800, cur 1557016394 expire 1557016244 last 1557016167 May 04 17:33:14 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 17:33:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 511a286a-7555-3179-23d0-2620081196ae (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983cc5e26400, cur 1557016398 expire 1557016248 last 1557016171 May 04 17:33:18 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 04 17:33:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 04 17:33:34 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages May 04 17:44:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 54ad9af7-50a0-90a2-7f16-e45b723f531c (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9847e672ac00, cur 1557017096 expire 1557016946 last 1557016869 May 04 17:45:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 04 17:45:17 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 17:54:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 1239aa75-3a7b-000a-099b-e23fd7e6fded (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98382bf1bc00, cur 1557017648 expire 1557017498 last 1557017421 May 04 17:54:08 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 17:57:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 04 17:57:03 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 18:07:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client cda96f87-639d-9cf0-7897-0ebd87a2d1c8 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983526691000, cur 1557018455 expire 1557018305 last 1557018228 May 04 18:07:35 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 18:07:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 04 18:07:59 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 18:17:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 04 18:17:59 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 18:18:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 52867129-fbd8-a539-d444-366a85e4c09b (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98472568d800, cur 1557019085 expire 1557018935 last 1557018858 May 04 18:18:05 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 18:27:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 04 18:27:38 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 18:27:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 4217312c-de7d-699c-df37-b7b9ed79d1d2 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983581fcd000, cur 1557019660 expire 1557019510 last 1557019433 May 04 18:27:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 18:35:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 2c083488-2dde-8148-1c71-2773a71ca78d (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983593e95000, cur 1557020139 expire 1557019989 last 1557019912 May 04 18:35:39 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 18:35:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 04 18:35:45 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 18:55:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 04b19bfb-73c4-35aa-ab54-95d0fb967def (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98295c903c00, cur 1557021303 expire 1557021153 last 1557021076 May 04 18:55:03 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 18:55:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 04 18:55:13 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 19:08:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 44d99712-0a32-8224-fa27-023394f7a2e7 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9835e6f93c00, cur 1557022095 expire 1557021945 last 1557021868 May 04 19:08:15 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 19:08:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 04 19:08:27 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 20:00:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 924721f2-8b7a-d9e8-9d84-6d74a6d68bc4 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98226cd8c000, cur 1557025247 expire 1557025097 last 1557025020 May 04 20:00:47 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 20:01:12 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 04 20:01:12 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 21:02:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 14d8e923-c6f9-c89f-2108-7e07133a0534 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9847d4645c00, cur 1557028963 expire 1557028813 last 1557028736 May 04 21:02:43 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 21:03:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 14d8e923-c6f9-c89f-2108-7e07133a0534 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982cb058c400, cur 1557028983 expire 1557028833 last 1557028756 May 04 21:03:03 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 04 21:03:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 04 21:03:15 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 21:12:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client d92925b0-6950-d73a-3c19-18f60b71f77c (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff981ec2ab5800, cur 1557029551 expire 1557029401 last 1557029324 May 04 21:12:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 04 21:12:45 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 21:42:13 fir-md1-s1 kernel: Lustre: 102663:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557031326/real 1557031326] req@ffff98226cf32700 x1632319145807984/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557031333 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 04 21:42:13 fir-md1-s1 kernel: Lustre: 101902:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557031326/real 1557031326] req@ffff9823c3e02400 x1632319145807712/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557031333 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 04 21:42:13 fir-md1-s1 kernel: Lustre: 101902:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages May 04 21:42:20 fir-md1-s1 kernel: Lustre: 101902:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557031333/real 1557031333] req@ffff9823c3e02400 x1632319145807712/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557031340 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 04 21:42:27 fir-md1-s1 kernel: Lustre: 101902:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557031340/real 1557031340] req@ffff9823c3e02400 x1632319145807712/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557031347 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 04 21:42:27 fir-md1-s1 kernel: Lustre: 101902:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 04 21:42:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b10ee8b8-e94f-f5ea-853b-6331b42e15b7 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982294053000, cur 1557031350 expire 1557031200 last 1557031123 May 04 21:42:30 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 21:42:49 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 04 21:42:49 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 21:49:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client ef06e220-e09f-677f-2fdb-acf8938b16d5 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9831be709c00, cur 1557031774 expire 1557031624 last 1557031547 May 04 21:49:34 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 21:49:35 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client df7d19ad-d25c-c9d8-2c8e-215559fb9bc2 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983ee76c5000, cur 1557031775 expire 1557031625 last 1557031548 May 04 21:49:35 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 04 21:49:58 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 04 21:49:58 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 21:59:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 2bf243cb-6f67-e42b-51f4-f6a7c5ee982b (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982c64bca800, cur 1557032354 expire 1557032204 last 1557032127 May 04 21:59:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 2bf243cb-6f67-e42b-51f4-f6a7c5ee982b (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984cacf82400, cur 1557032366 expire 1557032216 last 1557032139 May 04 21:59:26 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 04 21:59:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 04 21:59:37 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 22:07:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 04 22:07:30 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 22:07:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 480555d5-5d64-d073-3639-0f5b2ce35e26 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9821f834a000, cur 1557032857 expire 1557032707 last 1557032630 May 04 22:21:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 7d1b2398-9a4a-11b4-6f97-80af175ca4ee (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983a83b85800, cur 1557033708 expire 1557033558 last 1557033481 May 04 22:21:48 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 22:24:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 04 22:24:36 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 22:47:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f1e8f696-2dcb-fb2e-d28b-ace730b2fdeb (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9838fc77d800, cur 1557035260 expire 1557035110 last 1557035033 May 04 22:47:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 22:48:12 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 04 22:48:12 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 23:38:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client ec44671f-3d65-fa77-ff7c-0f9f8180fb37 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98321a2dd000, cur 1557038331 expire 1557038181 last 1557038104 May 04 23:38:51 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 04 23:39:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client ec44671f-3d65-fa77-ff7c-0f9f8180fb37 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9842f373b400, cur 1557038350 expire 1557038200 last 1557038123 May 04 23:39:10 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 04 23:44:58 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 04 23:44:58 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 05 00:05:41 fir-md1-s1 kernel: Lustre: 102700:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557039934/real 1557039934] req@ffff982029a4b600 x1632321176682176/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557039941 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 05 00:05:41 fir-md1-s1 kernel: Lustre: 102700:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 05 00:05:48 fir-md1-s1 kernel: Lustre: 102628:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557039941/real 1557039941] req@ffff982c7b3aef00 x1632321176682384/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557039948 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 00:05:49 fir-md1-s1 kernel: Lustre: 102481:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9827733d0000 x1632099089108848/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:24/0 lens 480/568 e 1 to 0 dl 1557039954 ref 2 fl Interpret:/0/0 rc 0/0 May 05 00:05:49 fir-md1-s1 kernel: Lustre: 102481:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 05 00:05:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 05 00:05:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 05 00:05:55 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 05 00:05:55 fir-md1-s1 kernel: Lustre: 102628:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557039948/real 1557039948] req@ffff982c7b3aef00 x1632321176682384/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557039955 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 00:05:55 fir-md1-s1 kernel: Lustre: 102628:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 05 00:06:02 fir-md1-s1 kernel: Lustre: 102700:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557039955/real 1557039955] req@ffff982029a4b600 x1632321176682176/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557039962 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 00:06:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 05 00:06:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 05 00:06:16 fir-md1-s1 kernel: Lustre: 102628:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557039969/real 1557039969] req@ffff982c7b3aef00 x1632321176682384/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557039976 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 00:06:16 fir-md1-s1 kernel: Lustre: 102628:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages May 05 00:06:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 4ac8fb02-ddf9-28e4-b563-927c1f401f4a (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff981cde70f000, cur 1557039982 expire 1557039832 last 1557039755 May 05 00:06:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 05 00:15:24 fir-md1-s1 kernel: Lustre: 102754:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982374bb2100 x1632099102103824/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:29/0 lens 480/568 e 1 to 0 dl 1557040529 ref 2 fl Interpret:/0/0 rc 0/0 May 05 00:15:24 fir-md1-s1 kernel: Lustre: 102754:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 05 00:15:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 05 00:15:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 05 00:15:30 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 05 00:15:40 fir-md1-s1 kernel: Lustre: 102427:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557040509/real 1557040509] req@ffff982c7b3aa100 x1632321334674368/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557040540 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 05 00:15:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 05 00:15:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 05 00:16:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 46ec5507-6f4e-bc2f-33b6-50ca0b2ab0b3 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983b7c2a7400, cur 1557040562 expire 1557040412 last 1557040335 May 05 00:16:02 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 05 00:16:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 46ec5507-6f4e-bc2f-33b6-50ca0b2ab0b3 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982e9423f800, cur 1557040569 expire 1557040419 last 1557040342 May 05 00:16:09 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 05 00:21:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 05 00:59:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 65f3208b-c8c0-19d5-f74e-5b534d0ff340 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9843fbfbcc00, cur 1557043172 expire 1557043022 last 1557042945 May 05 00:59:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 05 00:59:47 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 05 01:06:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 672caf69-51af-50bf-c8ba-5b4846099a5d (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9822b6682000, cur 1557043567 expire 1557043417 last 1557043340 May 05 01:06:07 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 05 01:06:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 05 01:06:21 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 05 01:14:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client ab036b1d-a85e-d27e-e5fc-7955b853ddcc (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98382eab4000, cur 1557044062 expire 1557043912 last 1557043835 May 05 01:14:22 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 05 01:14:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client ab036b1d-a85e-d27e-e5fc-7955b853ddcc (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983ccf653800, cur 1557044078 expire 1557043928 last 1557043851 May 05 01:14:38 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 05 01:14:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 05 01:14:55 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 05 01:22:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 33446197-b79e-e7f3-0d84-b0bbf23bc42f (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98397afd4800, cur 1557044525 expire 1557044375 last 1557044298 May 05 01:22:07 fir-md1-s1 kernel: Lustre: 101750:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557044520/real 1557044520] req@ffff981d24e89b00 x1632322442586288/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557044527 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 05 01:22:14 fir-md1-s1 kernel: Lustre: 101750:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557044527/real 1557044527] req@ffff981d24e89b00 x1632322442586288/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557044534 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 01:22:15 fir-md1-s1 kernel: Lustre: 102509:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982ccc174200 x1632099199371040/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:20/0 lens 480/568 e 1 to 0 dl 1557044540 ref 2 fl Interpret:/0/0 rc 0/0 May 05 01:22:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 05 01:22:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 05 01:22:22 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 05 01:22:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 33446197-b79e-e7f3-0d84-b0bbf23bc42f (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983b83f33000, cur 1557044544 expire 1557044394 last 1557044317 May 05 01:22:24 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 05 01:25:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 05 01:40:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17a7a482-1271-33ea-b064-feb50332109c (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9822fa5a8400, cur 1557045604 expire 1557045454 last 1557045377 May 05 01:40:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 05 01:40:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 05 01:43:53 fir-md1-s1 kernel: Lustre: 102456:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557045826/real 1557045826] req@ffff985cadb82700 x1632322785833920/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557045833 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 05 01:43:53 fir-md1-s1 kernel: Lustre: 102456:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 05 01:44:00 fir-md1-s1 kernel: Lustre: 102456:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557045833/real 1557045833] req@ffff985cadb82700 x1632322785833920/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557045840 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 01:44:01 fir-md1-s1 kernel: Lustre: 102527:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982a08276900 x1632099224867376/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:6/0 lens 480/568 e 1 to 0 dl 1557045846 ref 2 fl Interpret:/0/0 rc 0/0 May 05 01:44:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 05 01:44:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 05 01:44:07 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 05 01:44:07 fir-md1-s1 kernel: Lustre: 102456:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557045840/real 1557045840] req@ffff985cadb82700 x1632322785833920/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557045847 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 01:44:14 fir-md1-s1 kernel: Lustre: 102456:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557045847/real 1557045847] req@ffff985cadb82700 x1632322785833920/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557045854 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 01:44:28 fir-md1-s1 kernel: Lustre: 102456:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557045861/real 1557045861] req@ffff985cadb82700 x1632322785833920/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557045868 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 01:44:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 05 01:44:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 05 01:44:28 fir-md1-s1 kernel: Lustre: 102456:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 05 01:44:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 05 01:44:50 fir-md1-s1 kernel: Lustre: 102456:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557045882/real 1557045882] req@ffff985cadb82700 x1632322785833920/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557045889 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 01:44:50 fir-md1-s1 kernel: Lustre: 102456:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 05 01:44:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 05 01:45:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 05 01:45:31 fir-md1-s1 kernel: Lustre: 102456:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557045924/real 1557045924] req@ffff985cadb82700 x1632322785833920/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557045931 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 01:45:31 fir-md1-s1 kernel: Lustre: 102456:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 05 01:45:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 05 01:45:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 05 01:45:32 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 05 01:45:53 fir-md1-s1 kernel: LustreError: 102456:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.27.23@o2ib6) returned error from glimpse AST (req@ffff985cadb82700 x1632322785833920 status -107 rc -107), evict it ns: mdt-fir-MDT0002_UUID lock: ffff984c783f7500/0xce885399ecc7ac49 lrc: 4/0,0 mode: PW/PW res: [0x2c0024028:0x3:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x40200000000000 nid: 10.8.27.23@o2ib6 remote: 0x5ca2390bbd474931 expref: 41 pid: 102726 timeout: 0 lvb_type: 0 May 05 01:45:53 fir-md1-s1 kernel: LustreError: 102456:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) Skipped 1 previous similar message May 05 01:45:53 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.27.23@o2ib6 was evicted due to a lock glimpse callback time out: rc -107 May 05 01:45:53 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message May 05 01:45:53 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 227s: evicting client at 10.8.27.23@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff984c783f7500/0xce885399ecc7ac49 lrc: 4/0,0 mode: PW/PW res: [0x2c0024028:0x3:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x40200000000000 nid: 10.8.27.23@o2ib6 remote: 0x5ca2390bbd474931 expref: 42 pid: 102726 timeout: 0 lvb_type: 0 May 05 01:45:53 fir-md1-s1 kernel: Lustre: 102456:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (126:1s); client may timeout. req@ffff982a08276900 x1632099224867376/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:6/0 lens 480/536 e 1 to 0 dl 1557045952 ref 1 fl Complete:/0/0 rc 301/301 May 05 01:46:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 101f9e20-27fc-7a6d-215b-5bbd3ac1ba7a (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983cb463a400, cur 1557046011 expire 1557045861 last 1557045784 May 05 01:46:51 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 05 01:53:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e7ca7b5c-b991-452d-a104-2b90c14ee66b (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984a65b72800, cur 1557046404 expire 1557046254 last 1557046177 May 05 01:53:24 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 05 01:53:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client e7ca7b5c-b991-452d-a104-2b90c14ee66b (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9828ecf9f000, cur 1557046412 expire 1557046262 last 1557046185 May 05 01:53:32 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 05 01:53:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 05 01:53:43 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages May 05 02:49:56 fir-md1-s1 kernel: LNetError: 101317:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 05 03:30:12 fir-md1-s1 kernel: Lustre: 102700:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557052205/real 1557052205] req@ffff9829f1e9d700 x1632324419021888/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557052212 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 05 03:30:12 fir-md1-s1 kernel: Lustre: 102700:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages May 05 03:30:20 fir-md1-s1 kernel: Lustre: 101683:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9859d20dec00 x1632099355335104/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:25/0 lens 480/568 e 1 to 0 dl 1557052225 ref 2 fl Interpret:/0/0 rc 0/0 May 05 03:30:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client ce9e113d-e8f1-e106-02dd-4c3be89e6b69 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9835a9e1e800, cur 1557052223 expire 1557052073 last 1557051996 May 05 03:30:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 05 03:30:44 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 05 04:11:00 fir-md1-s1 kernel: LNetError: 101317:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 05 06:34:28 fir-md1-s1 kernel: LNetError: 101318:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 05 07:46:09 fir-md1-s1 kernel: LNetError: 101318:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 05 08:52:16 fir-md1-s1 kernel: LNetError: 101320:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 05 08:52:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client fe6e0c95-4050-3263-6441-0c8bd9611956 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9849087e9400, cur 1557071538 expire 1557071388 last 1557071311 May 05 08:52:18 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 05 08:52:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a9f925c3-c6cc-1827-f5ca-23aeee30bd63 (at 10.8.23.14@o2ib6) May 05 08:52:48 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 05 09:14:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 55455114-9841-d0dd-10fa-08a14775789b (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983214e34400, cur 1557072852 expire 1557072702 last 1557072625 May 05 09:14:12 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 05 09:14:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a9f925c3-c6cc-1827-f5ca-23aeee30bd63 (at 10.8.23.14@o2ib6) May 05 09:14:35 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 05 09:32:45 fir-md1-s1 kernel: Lustre: 102459:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 May 05 09:32:45 fir-md1-s1 kernel: Lustre: 102459:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 401 previous similar messages May 05 10:05:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 10867627-0a3b-0ced-8cf7-ab7628cfde78 (at 10.8.9.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984979ff7400, cur 1557075955 expire 1557075805 last 1557075728 May 05 10:05:55 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 05 11:28:35 fir-md1-s1 kernel: LNetError: 101319:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 05 12:05:44 fir-md1-s1 kernel: Lustre: 102473:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557083137/real 1557083137] req@ffff98217a815a00 x1632331695279488/t0(0) o106->fir-MDT0002@10.8.14.2@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557083144 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 05 12:05:44 fir-md1-s1 kernel: Lustre: 102473:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages May 05 12:05:51 fir-md1-s1 kernel: Lustre: 102473:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557083144/real 1557083144] req@ffff98217a815a00 x1632331695279488/t0(0) o106->fir-MDT0002@10.8.14.2@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557083151 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 12:05:52 fir-md1-s1 kernel: Lustre: 102567:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9827e2b14800 x1632099835336320/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:27/0 lens 480/568 e 1 to 0 dl 1557083157 ref 2 fl Interpret:/0/0 rc 0/0 May 05 12:05:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 05 12:05:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 05 12:05:58 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 05 12:05:58 fir-md1-s1 kernel: Lustre: 102473:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557083151/real 1557083151] req@ffff98217a815a00 x1632331695279488/t0(0) o106->fir-MDT0002@10.8.14.2@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557083158 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 12:06:05 fir-md1-s1 kernel: Lustre: 102473:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557083158/real 1557083158] req@ffff98217a815a00 x1632331695279488/t0(0) o106->fir-MDT0002@10.8.14.2@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557083165 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 12:06:19 fir-md1-s1 kernel: Lustre: 102473:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557083172/real 1557083172] req@ffff98217a815a00 x1632331695279488/t0(0) o106->fir-MDT0002@10.8.14.2@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557083179 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 12:06:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 05 12:06:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 05 12:06:19 fir-md1-s1 kernel: Lustre: 102473:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 05 12:06:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 05 12:06:40 fir-md1-s1 kernel: Lustre: 102473:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557083193/real 1557083193] req@ffff98217a815a00 x1632331695279488/t0(0) o106->fir-MDT0002@10.8.14.2@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557083200 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 12:06:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 05 12:06:40 fir-md1-s1 kernel: Lustre: 102473:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 05 12:07:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 05 12:07:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 05 12:07:22 fir-md1-s1 kernel: Lustre: 102473:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557083235/real 1557083235] req@ffff98217a815a00 x1632331695279488/t0(0) o106->fir-MDT0002@10.8.14.2@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557083242 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 12:07:22 fir-md1-s1 kernel: Lustre: 102473:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 05 12:07:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 05 12:07:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 05 12:07:41 fir-md1-s1 kernel: LustreError: 102473:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.14.2@o2ib6) returned error from glimpse AST (req@ffff98217a815a00 x1632331695279488 status -107 rc -107), evict it ns: mdt-fir-MDT0002_UUID lock: ffff98353f2118c0/0xce88539f66b8c299 lrc: 4/0,0 mode: PW/PW res: [0x2c00237f7:0x1b3:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x40200000000000 nid: 10.8.14.2@o2ib6 remote: 0xcbbc279818d26632 expref: 15 pid: 102550 timeout: 0 lvb_type: 0 May 05 12:07:41 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.14.2@o2ib6) May 05 12:07:41 fir-md1-s1 kernel: LustreError: 102473:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) Skipped 1 previous similar message May 05 12:07:41 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.14.2@o2ib6 was evicted due to a lock glimpse callback time out: rc -107 May 05 12:07:41 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message May 05 12:07:41 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 425s: evicting client at 10.8.14.2@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff98353f2118c0/0xce88539f66b8c299 lrc: 4/0,0 mode: PW/PW res: [0x2c00237f7:0x1b3:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x40200000000000 nid: 10.8.14.2@o2ib6 remote: 0xcbbc279818d26632 expref: 16 pid: 102550 timeout: 0 lvb_type: 0 May 05 12:08:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 46e0bcf2-fd2a-2954-a40d-64dcbd9e1b39 (at 10.8.14.2@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9846712fd400, cur 1557083311 expire 1557083161 last 1557083084 May 05 12:08:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 05 13:29:21 fir-md1-s1 kernel: LNetError: 101318:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 05 14:58:26 fir-md1-s1 kernel: LNetError: 101317:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 05 16:01:52 fir-md1-s1 kernel: LNetError: 101317:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 05 16:45:43 fir-md1-s1 kernel: LNetError: 101317:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 05 16:46:11 fir-md1-s1 kernel: LNetError: 101319:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 05 17:00:17 fir-md1-s1 kernel: Lustre: 102567:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557100810/real 1557100810] req@ffff9822d29b3c00 x1632335910016000/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557100817 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 05 17:00:17 fir-md1-s1 kernel: Lustre: 102567:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 05 17:00:25 fir-md1-s1 kernel: Lustre: 102624:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982285165d00 x1632100247056736/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:0/0 lens 480/568 e 1 to 0 dl 1557100830 ref 2 fl Interpret:/0/0 rc 0/0 May 05 17:00:31 fir-md1-s1 kernel: Lustre: 102567:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557100824/real 1557100824] req@ffff9822d29b3c00 x1632335910016000/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557100831 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 17:00:31 fir-md1-s1 kernel: Lustre: 102567:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 05 17:00:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 05 17:00:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 05 17:00:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 05 17:00:52 fir-md1-s1 kernel: Lustre: 102567:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557100845/real 1557100845] req@ffff9822d29b3c00 x1632335910016000/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557100852 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 17:00:52 fir-md1-s1 kernel: Lustre: 102567:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 05 17:00:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 05 17:00:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 05 17:01:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 05 17:01:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 05 17:01:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 05 17:01:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 05 17:01:34 fir-md1-s1 kernel: Lustre: 102567:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557100887/real 1557100887] req@ffff9822d29b3c00 x1632335910016000/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557100894 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 17:01:34 fir-md1-s1 kernel: Lustre: 102567:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 05 17:01:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 05 17:01:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 05 17:02:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 05 17:02:37 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 05 17:02:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 05 17:02:37 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 05 17:02:51 fir-md1-s1 kernel: Lustre: 102567:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557100964/real 1557100964] req@ffff9822d29b3c00 x1632335910016000/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557100971 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 17:02:51 fir-md1-s1 kernel: Lustre: 102567:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 10 previous similar messages May 05 17:03:30 fir-md1-s1 kernel: LNet: Service thread pid 102567 was inactive for 200.35s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: May 05 17:03:30 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message May 05 17:03:30 fir-md1-s1 kernel: Pid: 102567, comm: mdt00_069 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 May 05 17:03:30 fir-md1-s1 kernel: Call Trace: May 05 17:03:30 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x500/0x8d0 [ptlrpc] May 05 17:03:30 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] May 05 17:03:30 fir-md1-s1 kernel: [] ldlm_glimpse_locks+0x3b/0x100 [ptlrpc] May 05 17:03:30 fir-md1-s1 kernel: [] mdt_do_glimpse+0x1e9/0x4c0 [mdt] May 05 17:03:30 fir-md1-s1 kernel: [] mdt_glimpse_enqueue+0x3d3/0x4f0 [mdt] May 05 17:03:30 fir-md1-s1 kernel: [] mdt_intent_glimpse+0x1f/0x30 [mdt] May 05 17:03:30 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] May 05 17:03:30 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] May 05 17:03:30 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] May 05 17:03:30 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] May 05 17:03:30 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] May 05 17:03:30 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 05 17:03:30 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] May 05 17:03:30 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 May 05 17:03:30 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 May 05 17:03:30 fir-md1-s1 kernel: [] 0xffffffffffffffff May 05 17:03:30 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1557101010.102567 May 05 17:03:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 314b365e-14a5-70c7-25e5-6bc358cba3f0 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9839f5ee1800, cur 1557101030 expire 1557100880 last 1557100803 May 05 17:03:50 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 05 17:03:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 314b365e-14a5-70c7-25e5-6bc358cba3f0 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984748b1d800, cur 1557101036 expire 1557100886 last 1557100809 May 05 17:03:56 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 05 17:03:56 fir-md1-s1 kernel: LNet: Service thread pid 102567 completed after 225.75s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). May 05 17:04:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 05 17:04:06 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages May 05 17:07:17 fir-md1-s1 kernel: LNetError: 101318:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 05 17:13:27 fir-md1-s1 kernel: Lustre: 102406:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557101600/real 1557101600] req@ffff983caa32e600 x1632336002028432/t0(0) o106->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557101607 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 05 17:13:27 fir-md1-s1 kernel: Lustre: 102406:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages May 05 17:13:35 fir-md1-s1 kernel: Lustre: 101711:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff983a16ed0300 x1631546335799616/t0(0) o101->c5d29146-8e69-99bb-85ae-0e928604facc@10.8.0.68@o2ib6:10/0 lens 480/568 e 1 to 0 dl 1557101620 ref 2 fl Interpret:/0/0 rc 0/0 May 05 17:13:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client c5d29146-8e69-99bb-85ae-0e928604facc (at 10.8.0.68@o2ib6) reconnecting May 05 17:13:41 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages May 05 17:13:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to e89dcc01-5b37-54c0-22e3-05cfd841e08e (at 10.8.0.68@o2ib6) May 05 17:13:41 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 05 17:15:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e610d38c-5bd4-9ae6-45c1-3312b300f5a9 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9823a0e8fc00, cur 1557101747 expire 1557101597 last 1557101520 May 05 17:15:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client e610d38c-5bd4-9ae6-45c1-3312b300f5a9 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982fd6e20000, cur 1557101757 expire 1557101607 last 1557101530 May 05 17:15:57 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 05 17:22:54 fir-md1-s1 kernel: Lustre: 102570:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557102167/real 1557102167] req@ffff9822a1f8fb00 x1632336012229936/t0(0) o106->fir-MDT0002@10.8.10.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557102174 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 05 17:22:54 fir-md1-s1 kernel: Lustre: 102570:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 24 previous similar messages May 05 17:23:12 fir-md1-s1 kernel: Lustre: 102485:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff982931320000 x1632100292058304/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:17/0 lens 480/568 e 0 to 0 dl 1557102197 ref 2 fl Interpret:/0/0 rc 0/0 May 05 17:23:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 05 17:23:18 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages May 05 17:23:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 05 17:23:18 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages May 05 17:23:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e0d1e5c9-c6ad-9c5f-0cfb-a4429801fddb (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983aa2fe5000, cur 1557102222 expire 1557102072 last 1557101995 May 05 17:26:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 14597763-6ba9-eacb-a6fa-2c21e3eae766 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9826f2ee3000, cur 1557102380 expire 1557102230 last 1557102153 May 05 17:26:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 05 17:30:12 fir-md1-s1 kernel: Lustre: 102717:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff9836aaa4ad00 x1631560547253488/t0(0) o36->1135836c-5fb6-92af-ade3-8ef6cf526018@10.8.27.9@o2ib6:17/0 lens 520/2888 e 0 to 0 dl 1557102617 ref 2 fl Interpret:/0/0 rc 0/0 May 05 17:30:17 fir-md1-s1 kernel: Lustre: 103224:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9857e262e900 x1631560739614208/t0(0) o101->0db2d4e0-bf1e-3689-817d-00b10dcb4858@10.9.102.20@o2ib4:22/0 lens 576/3264 e 1 to 0 dl 1557102622 ref 2 fl Interpret:/0/0 rc 0/0 May 05 17:30:17 fir-md1-s1 kernel: Lustre: 103224:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 05 17:30:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 1135836c-5fb6-92af-ade3-8ef6cf526018 (at 10.8.27.9@o2ib6) reconnecting May 05 17:30:21 fir-md1-s1 kernel: Lustre: 102617:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9846a164bc00 x1631560739614256/t0(0) o101->0db2d4e0-bf1e-3689-817d-00b10dcb4858@10.9.102.20@o2ib4:26/0 lens 576/3264 e 1 to 0 dl 1557102626 ref 2 fl Interpret:/0/0 rc 0/0 May 05 17:30:34 fir-md1-s1 kernel: LNetError: 101320:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 05 17:31:17 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 190a7ace-f44e-21af-e979-be4a353481c1 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984a7b3e6800, cur 1557102677 expire 1557102527 last 1557102450 May 05 17:31:17 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 05 17:31:28 fir-md1-s1 kernel: LustreError: 102543:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557102598, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff98481ca7e780/0xce8853a313204993 lrc: 3/1,0 mode: --/PR res: [0x2c00130be:0x1bf8:0x0].0x0 bits 0x13/0x0 rrc: 67 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 102543 timeout: 0 lvb_type: 0 May 05 17:31:36 fir-md1-s1 kernel: LustreError: 102533:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557102606, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff984c94bf3600/0xce8853a3139fda7d lrc: 3/1,0 mode: --/PR res: [0x2c00130be:0x1bf8:0x0].0x0 bits 0x13/0x0 rrc: 67 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 102533 timeout: 0 lvb_type: 0 May 05 17:31:36 fir-md1-s1 kernel: LustreError: 102533:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message May 05 17:31:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client bd7325a9-b1c4-77a5-13a8-c3ddd4816d3e (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9837f2202800, cur 1557102697 expire 1557102547 last 1557102470 May 05 17:31:37 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 05 17:31:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.8.27.9@o2ib6) May 05 17:31:51 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages May 05 17:31:55 fir-md1-s1 kernel: Lustre: 102592:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff983ca3c5f200 x1631536548101328/t0(0) o36->d3c03fa2-3e41-4741-cf2d-21c94adb10e5@10.9.108.40@o2ib4:0/0 lens 520/2888 e 0 to 0 dl 1557102720 ref 2 fl Interpret:/0/0 rc 0/0 May 05 17:31:55 fir-md1-s1 kernel: Lustre: 102592:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 05 17:32:22 fir-md1-s1 kernel: LustreError: 102371:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.27.23@o2ib6) failed to reply to blocking AST (req@ffff9838b6af2a00 x1632336018960176 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff984a6db89b00/0xce8853a3107838d9 lrc: 4/0,0 mode: PR/PR res: [0x2c00130be:0x1bf8:0x0].0x0 bits 0x13/0x0 rrc: 67 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0xb9b8ae4839065fd3 expref: 167 pid: 102658 timeout: 628132 lvb_type: 0 May 05 17:32:22 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.27.23@o2ib6 was evicted due to a lock blocking callback time out: rc -110 May 05 17:32:22 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 155s: evicting client at 10.8.27.23@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff984a6db89b00/0xce8853a3107838d9 lrc: 3/0,0 mode: PR/PR res: [0x2c00130be:0x1bf8:0x0].0x0 bits 0x13/0x0 rrc: 67 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0xb9b8ae4839065fd3 expref: 168 pid: 102658 timeout: 0 lvb_type: 0 May 05 17:32:22 fir-md1-s1 kernel: Lustre: 102371:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (154:1s); client may timeout. req@ffff9836aaa4ad00 x1631560547253488/t280092476500(0) o36->1135836c-5fb6-92af-ade3-8ef6cf526018@10.8.27.9@o2ib6:17/0 lens 520/424 e 0 to 0 dl 1557102741 ref 1 fl Complete:/0/0 rc 0/0 May 05 17:32:23 fir-md1-s1 kernel: Lustre: 103235:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff985ba2b8c500 x1631597618183872/t0(0) o36->1150d024-edb9-77c4-d1f6-595b5a47780e@10.9.106.23@o2ib4:28/0 lens 576/2888 e 0 to 0 dl 1557102748 ref 2 fl Interpret:/0/0 rc 0/0 May 05 17:32:23 fir-md1-s1 kernel: Lustre: 103235:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 05 17:32:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client cc2c8b04-003e-22cf-de7e-1188f933372b (at 10.8.27.23@o2ib6) in 181 seconds. I think it's dead, and I am evicting it. exp ffff983a357b7c00, cur 1557102758 expire 1557102608 last 1557102577 May 05 17:33:24 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 803c4c64-c10c-05dd-a24b-fc5f4622ac17 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983fccb19400, cur 1557102804 expire 1557102654 last 1557102577 May 05 17:36:16 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 055fe700-e660-26fd-0272-9ee3f0a19d6c (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984b567ee400, cur 1557102976 expire 1557102826 last 1557102749 May 05 17:42:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 05 17:42:43 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages May 05 17:44:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b4bd7b93-cacc-15b1-2cfb-98eabdadef45 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff981f847ab000, cur 1557103460 expire 1557103310 last 1557103233 May 05 17:44:20 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 05 17:53:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b2b98883-4628-1f53-309c-6ef2c28133a9 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9846ea385000, cur 1557104011 expire 1557103861 last 1557103784 May 05 17:53:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 05 17:53:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 05 17:53:47 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 05 18:05:54 fir-md1-s1 kernel: LNetError: 101320:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 05 18:09:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 05 18:09:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 05 18:10:00 fir-md1-s1 kernel: Lustre: 102593:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557104993/real 1557104993] req@ffff98215061fb00 x1632336497346320/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557105000 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 05 18:10:00 fir-md1-s1 kernel: Lustre: 102593:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 11 previous similar messages May 05 18:10:00 fir-md1-s1 kernel: LustreError: 102593:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.27.23@o2ib6) returned error from glimpse AST (req@ffff98215061fb00 x1632336497346320 status -107 rc -107), evict it ns: mdt-fir-MDT0002_UUID lock: ffff984c83b6fbc0/0xce8853a38c14c820 lrc: 4/0,0 mode: PW/PW res: [0x2c0024036:0x11:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x40200000000000 nid: 10.8.27.23@o2ib6 remote: 0xa62e1acf56e265af expref: 142 pid: 102476 timeout: 0 lvb_type: 0 May 05 18:10:00 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.27.23@o2ib6 was evicted due to a lock glimpse callback time out: rc -107 May 05 18:10:00 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 221s: evicting client at 10.8.27.23@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff984c83b6fbc0/0xce8853a38c14c820 lrc: 4/0,0 mode: PW/PW res: [0x2c0024036:0x11:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x40200000000000 nid: 10.8.27.23@o2ib6 remote: 0xa62e1acf56e265af expref: 143 pid: 102476 timeout: 0 lvb_type: 0 May 05 18:10:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 3a266b78-35e3-8d2e-d60d-b1d3410316d8 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982157275800, cur 1557105035 expire 1557104885 last 1557104808 May 05 18:10:35 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 05 18:12:46 fir-md1-s1 kernel: Lustre: 102394:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 May 05 18:13:11 fir-md1-s1 kernel: Lustre: 102659:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 05 18:13:12 fir-md1-s1 kernel: Lustre: 102483:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 05 18:13:12 fir-md1-s1 kernel: Lustre: 102483:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2015 previous similar messages May 05 18:13:14 fir-md1-s1 kernel: Lustre: 102646:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 05 18:13:14 fir-md1-s1 kernel: Lustre: 102646:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2934 previous similar messages May 05 18:13:18 fir-md1-s1 kernel: Lustre: 101920:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 05 18:13:18 fir-md1-s1 kernel: Lustre: 101920:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 6439 previous similar messages May 05 18:13:26 fir-md1-s1 kernel: Lustre: 102394:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 05 18:13:26 fir-md1-s1 kernel: Lustre: 102394:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 12577 previous similar messages May 05 18:37:52 fir-md1-s1 kernel: Lustre: 102385:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557106665/real 1557106665] req@ffff982b85bbe300 x1632336892852240/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557106672 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 05 18:38:00 fir-md1-s1 kernel: Lustre: 102381:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff981cecaa2d00 x1632100517257984/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:5/0 lens 480/568 e 1 to 0 dl 1557106685 ref 2 fl Interpret:/0/0 rc 0/0 May 05 18:38:00 fir-md1-s1 kernel: Lustre: 102381:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 05 18:38:06 fir-md1-s1 kernel: Lustre: 102385:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557106679/real 1557106679] req@ffff982b85bbe300 x1632336892852240/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557106686 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 18:38:06 fir-md1-s1 kernel: Lustre: 102385:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 05 18:38:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 05 18:38:06 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages May 05 18:38:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 05 18:38:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 05 18:38:27 fir-md1-s1 kernel: Lustre: 102385:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557106700/real 1557106700] req@ffff982b85bbe300 x1632336892852240/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557106707 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 18:38:27 fir-md1-s1 kernel: Lustre: 102385:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 05 18:39:09 fir-md1-s1 kernel: Lustre: 102385:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557106742/real 1557106742] req@ffff982b85bbe300 x1632336892852240/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557106749 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 18:39:09 fir-md1-s1 kernel: Lustre: 102385:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 05 18:39:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 05 18:39:10 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 05 18:39:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 05 18:39:31 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages May 05 18:40:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f0de2cdc-c8d7-f876-7ae9-dfeeafee4256 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984ab92b1800, cur 1557106804 expire 1557106654 last 1557106577 May 05 18:40:04 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 05 18:53:25 fir-md1-s1 kernel: LNetError: 101317:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 05 18:55:18 fir-md1-s1 kernel: LNetError: 101320:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 05 19:06:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 750359b1-25ef-0897-47c0-badbd7fbdfde (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982985c0b400, cur 1557108386 expire 1557108236 last 1557108159 May 05 19:06:26 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 05 19:06:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 05 19:06:42 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages May 05 19:21:58 fir-md1-s1 kernel: Lustre: 102537:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557109311/real 1557109311] req@ffff9824647d4800 x1632337515406016/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557109318 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 05 19:21:58 fir-md1-s1 kernel: Lustre: 102537:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages May 05 19:22:06 fir-md1-s1 kernel: Lustre: 102569:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff981ec5e9a700 x1632100668980848/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:11/0 lens 480/568 e 1 to 0 dl 1557109331 ref 2 fl Interpret:/0/0 rc 0/0 May 05 19:22:12 fir-md1-s1 kernel: Lustre: 102716:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557109325/real 1557109325] req@ffff9821ca4ba700 x1632337515406640/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557109332 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 19:22:12 fir-md1-s1 kernel: Lustre: 102716:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 05 19:22:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 05 19:22:14 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 05 19:22:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 05 19:22:14 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 05 19:22:33 fir-md1-s1 kernel: Lustre: 102537:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557109346/real 1557109346] req@ffff9824647d4800 x1632337515406016/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557109353 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 19:22:33 fir-md1-s1 kernel: Lustre: 102537:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 05 19:22:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 05 19:22:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 05 19:22:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 05 19:23:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 31108e76-1364-d253-fa5d-86986522d1ba (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983171e24000, cur 1557109385 expire 1557109235 last 1557109158 May 05 19:23:05 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 05 19:23:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 05 19:29:46 fir-md1-s1 kernel: LNetError: 101317:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 05 19:49:32 fir-md1-s1 kernel: LNetError: 101315:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 05 20:02:05 fir-md1-s1 kernel: LNetError: 101319:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 05 20:27:00 fir-md1-s1 kernel: LNetError: 101316:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 05 20:55:23 fir-md1-s1 kernel: Lustre: 102439:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557114916/real 1557114916] req@ffff9845ab0a3f00 x1632338793144032/t0(0) o104->fir-MDT0002@10.9.114.5@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1557114923 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 05 20:55:23 fir-md1-s1 kernel: Lustre: 102439:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 8 previous similar messages May 05 20:55:30 fir-md1-s1 kernel: Lustre: 102439:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557114923/real 1557114923] req@ffff9845ab0a3f00 x1632338793144032/t0(0) o104->fir-MDT0002@10.9.114.5@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1557114930 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 20:55:31 fir-md1-s1 kernel: Lustre: 102539:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9849f472d400 x1631397889094816/t0(0) o36->63d5b964-3268-0813-20ce-ce2c87bbd1d8@10.8.29.2@o2ib6:6/0 lens 520/2888 e 1 to 0 dl 1557114936 ref 2 fl Interpret:/0/0 rc 0/0 May 05 20:55:31 fir-md1-s1 kernel: Lustre: 102539:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 05 20:55:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 63d5b964-3268-0813-20ce-ce2c87bbd1d8 (at 10.8.29.2@o2ib6) reconnecting May 05 20:55:37 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 05 20:55:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7cd3b03e-5b9b-97b8-40d1-673a4215b949 (at 10.8.29.2@o2ib6) May 05 20:55:37 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 05 20:55:44 fir-md1-s1 kernel: Lustre: 102439:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557114937/real 1557114937] req@ffff9845ab0a3f00 x1632338793144032/t0(0) o104->fir-MDT0002@10.9.114.5@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1557114944 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 20:55:44 fir-md1-s1 kernel: Lustre: 102439:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 05 20:55:51 fir-md1-s1 kernel: LustreError: 102439:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.9.114.5@o2ib4) failed to reply to blocking AST (req@ffff9845ab0a3f00 x1632338793144032 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff981feb3a5c40/0xce8853a5b4441326 lrc: 4/0,0 mode: PR/PR res: [0x2c00128ef:0x1591c:0x0].0x0 bits 0x13/0x0 rrc: 18 type: IBT flags: 0x60200400000020 nid: 10.9.114.5@o2ib4 remote: 0x606af16dd60613da expref: 37 pid: 102451 timeout: 640245 lvb_type: 0 May 05 20:55:51 fir-md1-s1 kernel: LustreError: 102439:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) Skipped 3 previous similar messages May 05 20:55:51 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.9.114.5@o2ib4 was evicted due to a lock blocking callback time out: rc -110 May 05 20:55:51 fir-md1-s1 kernel: LustreError: Skipped 3 previous similar messages May 05 20:55:51 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 35s: evicting client at 10.9.114.5@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff981feb3a5c40/0xce8853a5b4441326 lrc: 3/0,0 mode: PR/PR res: [0x2c00128ef:0x1591c:0x0].0x0 bits 0x13/0x0 rrc: 18 type: IBT flags: 0x60200400000020 nid: 10.9.114.5@o2ib4 remote: 0x606af16dd60613da expref: 38 pid: 102451 timeout: 0 lvb_type: 0 May 05 20:58:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 514ad686-cd5b-be48-3af9-a061fcf7c5e8 (at 10.9.114.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985c8fb5e000, cur 1557115107 expire 1557114957 last 1557114880 May 05 20:58:27 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 05 20:58:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.114.5@o2ib4) May 05 20:58:34 fir-md1-s1 kernel: LNetError: 101319:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 05 21:26:47 fir-md1-s1 kernel: LNetError: 101313:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 05 21:38:05 fir-md1-s1 kernel: Lustre: 102435:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557117478/real 1557117478] req@ffff981eff223000 x1632339363951088/t0(0) o106->fir-MDT0002@10.9.114.5@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1557117485 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 05 21:38:05 fir-md1-s1 kernel: Lustre: 102435:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 05 21:38:12 fir-md1-s1 kernel: Lustre: 102599:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557117485/real 1557117485] req@ffff9823a0a82400 x1632339363951504/t0(0) o106->fir-MDT0002@10.9.114.5@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1557117492 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 21:38:13 fir-md1-s1 kernel: Lustre: 102501:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9828d666bc00 x1632101121427344/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:18/0 lens 480/568 e 1 to 0 dl 1557117498 ref 2 fl Interpret:/0/0 rc 0/0 May 05 21:38:19 fir-md1-s1 kernel: Lustre: 102599:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557117492/real 1557117492] req@ffff9823a0a82400 x1632339363951504/t0(0) o106->fir-MDT0002@10.9.114.5@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1557117499 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 21:38:19 fir-md1-s1 kernel: Lustre: 102599:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 05 21:38:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 05 21:38:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 05 21:38:19 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 05 21:38:33 fir-md1-s1 kernel: Lustre: 102435:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557117506/real 1557117506] req@ffff981eff223000 x1632339363951088/t0(0) o106->fir-MDT0002@10.9.114.5@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1557117513 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 21:38:33 fir-md1-s1 kernel: Lustre: 102435:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 05 21:38:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 05 21:38:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 05 21:38:54 fir-md1-s1 kernel: Lustre: 102599:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557117527/real 1557117527] req@ffff9823a0a82400 x1632339363951504/t0(0) o106->fir-MDT0002@10.9.114.5@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1557117534 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 21:38:54 fir-md1-s1 kernel: Lustre: 102599:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 05 21:39:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 05 21:39:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 05 21:39:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 05 21:39:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 05 21:39:36 fir-md1-s1 kernel: Lustre: 102435:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557117569/real 1557117569] req@ffff981eff223000 x1632339363951088/t0(0) o106->fir-MDT0002@10.9.114.5@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1557117576 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 21:39:36 fir-md1-s1 kernel: Lustre: 102435:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 10 previous similar messages May 05 21:39:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 05 21:39:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 05 21:40:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client daa77fae-4b8c-75d0-1ecc-0a79e00652e7 (at 10.9.114.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982ae1f96c00, cur 1557117601 expire 1557117451 last 1557117374 May 05 21:40:01 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 05 21:42:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.114.5@o2ib4) May 05 21:57:24 fir-md1-s1 kernel: LNetError: 101317:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 05 22:22:05 fir-md1-s1 kernel: LNetError: 101317:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 05 22:37:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 901c6d03-45fd-4c09-4368-156dc13152bb (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9823e0fcf000, cur 1557121050 expire 1557120900 last 1557120823 May 05 22:37:30 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 05 22:38:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 05 22:38:28 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 05 22:43:32 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 05 22:43:32 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 05 22:43:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 8e9e430a-c85d-67e0-c730-0438e3526f0d (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984809336c00, cur 1557121414 expire 1557121264 last 1557121187 May 05 22:43:34 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 05 22:58:06 fir-md1-s1 kernel: LNetError: 101317:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 05 23:19:21 fir-md1-s1 kernel: Lustre: 102595:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557123553/real 1557123553] req@ffff983c14b36600 x1632340720101440/t0(0) o104->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557123560 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 05 23:19:21 fir-md1-s1 kernel: Lustre: 102595:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 7 previous similar messages May 05 23:19:28 fir-md1-s1 kernel: Lustre: 102360:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff983c7deee000 x1632181586589776/t0(0) o101->2157fc51-ad2b-af56-95eb-ebec24704c29@10.8.1.5@o2ib6:3/0 lens 480/568 e 1 to 0 dl 1557123573 ref 2 fl Interpret:/0/0 rc 0/0 May 05 23:19:28 fir-md1-s1 kernel: Lustre: 102360:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 05 23:19:35 fir-md1-s1 kernel: Lustre: 102595:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557123568/real 1557123568] req@ffff983c14b36600 x1632340720101440/t0(0) o104->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557123575 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 23:19:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 2157fc51-ad2b-af56-95eb-ebec24704c29 (at 10.8.1.5@o2ib6) reconnecting May 05 23:19:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 2cd49213-7c4f-97de-a7e8-fa9afc8ff8ae (at 10.8.1.5@o2ib6) May 05 23:19:35 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 05 23:19:35 fir-md1-s1 kernel: Lustre: 102595:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 05 23:19:56 fir-md1-s1 kernel: Lustre: 102595:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557123589/real 1557123589] req@ffff983c14b36600 x1632340720101440/t0(0) o104->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557123596 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 23:19:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 2157fc51-ad2b-af56-95eb-ebec24704c29 (at 10.8.1.5@o2ib6) reconnecting May 05 23:19:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 2cd49213-7c4f-97de-a7e8-fa9afc8ff8ae (at 10.8.1.5@o2ib6) May 05 23:19:56 fir-md1-s1 kernel: Lustre: 102595:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 05 23:20:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 2157fc51-ad2b-af56-95eb-ebec24704c29 (at 10.8.1.5@o2ib6) reconnecting May 05 23:20:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 2cd49213-7c4f-97de-a7e8-fa9afc8ff8ae (at 10.8.1.5@o2ib6) May 05 23:20:38 fir-md1-s1 kernel: Lustre: 102595:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557123631/real 1557123631] req@ffff983c14b36600 x1632340720101440/t0(0) o104->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557123638 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 23:20:38 fir-md1-s1 kernel: Lustre: 102595:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 05 23:20:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 2157fc51-ad2b-af56-95eb-ebec24704c29 (at 10.8.1.5@o2ib6) reconnecting May 05 23:20:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 2cd49213-7c4f-97de-a7e8-fa9afc8ff8ae (at 10.8.1.5@o2ib6) May 05 23:20:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 2157fc51-ad2b-af56-95eb-ebec24704c29 (at 10.8.1.5@o2ib6) reconnecting May 05 23:20:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 2cd49213-7c4f-97de-a7e8-fa9afc8ff8ae (at 10.8.1.5@o2ib6) May 05 23:21:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 2157fc51-ad2b-af56-95eb-ebec24704c29 (at 10.8.1.5@o2ib6) reconnecting May 05 23:21:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 2cd49213-7c4f-97de-a7e8-fa9afc8ff8ae (at 10.8.1.5@o2ib6) May 05 23:21:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 2cd49213-7c4f-97de-a7e8-fa9afc8ff8ae (at 10.8.1.5@o2ib6) May 05 23:21:48 fir-md1-s1 kernel: LustreError: 102595:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.27.23@o2ib6) failed to reply to blocking AST (req@ffff983c14b36600 x1632340720101440 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff984cb75c9440/0xce8853a7a0425195 lrc: 4/0,0 mode: PW/PW res: [0x20002189b:0x9:0x0].0x0 bits 0x40/0x0 rrc: 10 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0x30d9d73ba77ca8db expref: 35 pid: 101749 timeout: 649121 lvb_type: 0 May 05 23:21:48 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.8.27.23@o2ib6 was evicted due to a lock blocking callback time out: rc -110 May 05 23:21:48 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 155s: evicting client at 10.8.27.23@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff984cb75c9440/0xce8853a7a0425195 lrc: 4/0,0 mode: PW/PW res: [0x20002189b:0x9:0x0].0x0 bits 0x40/0x0 rrc: 10 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0x30d9d73ba77ca8db expref: 36 pid: 101749 timeout: 0 lvb_type: 0 May 05 23:22:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client e711c326-9f7d-56b9-9495-262ae7e853f7 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9847ca266400, cur 1557123749 expire 1557123599 last 1557123522 May 05 23:22:29 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 05 23:22:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 05 23:24:05 fir-md1-s1 kernel: LNetError: 101315:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 05 23:47:23 fir-md1-s1 kernel: Lustre: 102570:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557125236/real 1557125236] req@ffff98213d6a3000 x1632341078037488/t0(0) o106->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557125243 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 05 23:47:23 fir-md1-s1 kernel: Lustre: 102570:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 10 previous similar messages May 05 23:47:31 fir-md1-s1 kernel: Lustre: 102548:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982be1764200 x1632101625786688/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:6/0 lens 480/568 e 1 to 0 dl 1557125256 ref 2 fl Interpret:/0/0 rc 0/0 May 05 23:47:37 fir-md1-s1 kernel: Lustre: 102570:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557125250/real 1557125250] req@ffff98213d6a3000 x1632341078037488/t0(0) o106->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557125257 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 23:47:37 fir-md1-s1 kernel: Lustre: 102570:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 05 23:47:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 05 23:47:37 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 05 23:47:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.0.10.3@o2ib7) May 05 23:47:37 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 05 23:47:58 fir-md1-s1 kernel: Lustre: 102570:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557125271/real 1557125271] req@ffff98213d6a3000 x1632341078037488/t0(0) o106->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557125278 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 23:47:58 fir-md1-s1 kernel: Lustre: 102570:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 05 23:47:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 05 23:47:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.0.10.3@o2ib7) May 05 23:48:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 05 23:48:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.0.10.3@o2ib7) May 05 23:48:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 05 23:48:40 fir-md1-s1 kernel: Lustre: 102570:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557125313/real 1557125313] req@ffff98213d6a3000 x1632341078037488/t0(0) o106->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557125320 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 23:48:40 fir-md1-s1 kernel: Lustre: 102570:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 05 23:49:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.0.10.3@o2ib7) May 05 23:49:01 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 05 23:49:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 05 23:49:22 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 05 23:49:57 fir-md1-s1 kernel: Lustre: 102570:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557125390/real 1557125390] req@ffff98213d6a3000 x1632341078037488/t0(0) o106->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557125397 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 23:49:57 fir-md1-s1 kernel: Lustre: 102570:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 10 previous similar messages May 05 23:50:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client b1c12f7e-0a87-6b2a-55d8-d25e6cdfd920 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983cb1686c00, cur 1557125407 expire 1557125257 last 1557125180 May 05 23:50:07 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 05 23:50:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.0.10.3@o2ib7) May 05 23:50:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b1c12f7e-0a87-6b2a-55d8-d25e6cdfd920 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98465de6c800, cur 1557125426 expire 1557125276 last 1557125199 May 05 23:50:26 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 05 23:50:26 fir-md1-s1 kernel: Lustre: 102570:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (188:2s); client may timeout. req@ffff982be1764200 x1632101625786688/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:6/0 lens 480/536 e 1 to 0 dl 1557125424 ref 1 fl Complete:/0/0 rc 301/301 May 05 23:50:26 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages May 05 23:51:55 fir-md1-s1 kernel: LNetError: 101320:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 06 01:12:41 fir-md1-s1 kernel: Lustre: 102619:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 May 06 01:12:41 fir-md1-s1 kernel: Lustre: 102619:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 14314 previous similar messages May 06 01:12:43 fir-md1-s1 kernel: Lustre: 102485:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 May 06 01:12:43 fir-md1-s1 kernel: Lustre: 102485:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3757 previous similar messages May 06 01:12:47 fir-md1-s1 kernel: Lustre: 102597:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 May 06 01:12:47 fir-md1-s1 kernel: Lustre: 102597:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 7433 previous similar messages May 06 01:12:55 fir-md1-s1 kernel: Lustre: 102370:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 May 06 01:12:55 fir-md1-s1 kernel: Lustre: 102370:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 14789 previous similar messages May 06 01:13:11 fir-md1-s1 kernel: Lustre: 102376:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 May 06 01:13:11 fir-md1-s1 kernel: Lustre: 102376:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 16094 previous similar messages May 06 01:13:43 fir-md1-s1 kernel: Lustre: 102376:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 May 06 01:13:43 fir-md1-s1 kernel: Lustre: 102376:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 8106 previous similar messages May 06 01:14:48 fir-md1-s1 kernel: Lustre: 102597:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 May 06 01:14:48 fir-md1-s1 kernel: Lustre: 102597:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 19184 previous similar messages May 06 01:27:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 24ca8116-461e-c903-a940-fe7ca2d04ce6 (at 10.9.114.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9848e2ed4000, cur 1557131253 expire 1557131103 last 1557131026 May 06 01:34:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.114.5@o2ib4) May 06 01:34:46 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages May 06 01:40:17 fir-md1-s1 kernel: LNetError: 101320:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 06 01:43:51 fir-md1-s1 kernel: Lustre: 102646:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557132224/real 1557132224] req@ffff98254ae78000 x1632342607489840/t0(0) o106->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557132231 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 06 01:43:51 fir-md1-s1 kernel: Lustre: 102477:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557132224/real 1557132224] req@ffff98220bfec200 x1632342607489776/t0(0) o106->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557132231 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 06 01:43:51 fir-md1-s1 kernel: Lustre: 102477:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages May 06 01:43:58 fir-md1-s1 kernel: Lustre: 102477:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557132231/real 1557132231] req@ffff98220bfec200 x1632342607489776/t0(0) o106->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557132238 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 01:43:59 fir-md1-s1 kernel: Lustre: 102370:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982a44368f00 x1632101922272992/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:4/0 lens 480/568 e 1 to 0 dl 1557132244 ref 2 fl Interpret:/0/0 rc 0/0 May 06 01:44:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 01:44:05 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages May 06 01:44:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.0.10.3@o2ib7) May 06 01:44:05 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 01:44:05 fir-md1-s1 kernel: Lustre: 102477:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557132238/real 1557132238] req@ffff98220bfec200 x1632342607489776/t0(0) o106->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557132245 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 01:44:05 fir-md1-s1 kernel: Lustre: 102477:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 06 01:44:19 fir-md1-s1 kernel: Lustre: 102646:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557132252/real 1557132252] req@ffff98254ae78000 x1632342607489840/t0(0) o106->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557132259 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 01:44:19 fir-md1-s1 kernel: Lustre: 102477:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557132252/real 1557132252] req@ffff98220bfec200 x1632342607489776/t0(0) o106->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557132259 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 01:44:19 fir-md1-s1 kernel: Lustre: 102477:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 06 01:44:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 01:44:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 01:44:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.0.10.3@o2ib7) May 06 01:44:47 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 06 01:44:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 9177b8cc-fbf6-7092-3950-f0b9a9cae43a (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982847370800, cur 1557132290 expire 1557132140 last 1557132063 May 06 01:44:50 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 01:53:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 67d4a7c9-5d29-d88b-f4cd-c33daa6af09b (at 10.8.15.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984700630c00, cur 1557132810 expire 1557132660 last 1557132583 May 06 01:53:30 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 01:54:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.15.4@o2ib6) May 06 01:54:20 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages May 06 02:08:47 fir-md1-s1 kernel: LNetError: 101317:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 06 02:27:00 fir-md1-s1 kernel: LNetError: 101320:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 06 02:29:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d77b9fc6-f5d4-2a19-3520-fb6d1e074654 (at 10.9.114.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984a0e67b400, cur 1557134977 expire 1557134827 last 1557134750 May 06 02:29:37 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 02:31:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.114.5@o2ib4) May 06 02:31:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 02:39:17 fir-md1-s1 kernel: Lustre: 102502:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557135550/real 1557135550] req@ffff9821e3eab000 x1632343361501552/t0(0) o106->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557135557 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 06 02:39:17 fir-md1-s1 kernel: Lustre: 102502:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages May 06 02:39:24 fir-md1-s1 kernel: Lustre: 102440:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557135557/real 1557135557] req@ffff9822c8aa2700 x1632343361501664/t0(0) o106->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557135564 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 02:39:25 fir-md1-s1 kernel: Lustre: 102477:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff98231e3fb300 x1632102046623216/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:0/0 lens 480/568 e 1 to 0 dl 1557135570 ref 2 fl Interpret:/0/0 rc 0/0 May 06 02:39:25 fir-md1-s1 kernel: Lustre: 102477:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 06 02:39:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 02:39:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 06 02:39:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 02:39:38 fir-md1-s1 kernel: Lustre: 102440:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557135571/real 1557135571] req@ffff9822c8aa2700 x1632343361501664/t0(0) o106->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557135578 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 02:39:38 fir-md1-s1 kernel: Lustre: 102440:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages May 06 02:39:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 02:39:59 fir-md1-s1 kernel: Lustre: 102502:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557135592/real 1557135592] req@ffff9821e3eab000 x1632343361501552/t0(0) o106->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557135599 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 02:39:59 fir-md1-s1 kernel: Lustre: 102502:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages May 06 02:40:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 02:40:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 06 02:40:13 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 06 02:40:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f49684ef-e9d0-d51f-e4f9-6eb30344d63e (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98259be4a400, cur 1557135624 expire 1557135474 last 1557135397 May 06 02:40:24 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 02:40:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f49684ef-e9d0-d51f-e4f9-6eb30344d63e (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983613234c00, cur 1557135631 expire 1557135481 last 1557135404 May 06 02:47:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 06 03:01:02 fir-md1-s1 kernel: Lustre: 102502:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557136854/real 1557136854] req@ffff9822c8aa3300 x1632343431535792/t0(0) o106->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557136861 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 06 03:01:02 fir-md1-s1 kernel: Lustre: 102502:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 7 previous similar messages May 06 03:01:09 fir-md1-s1 kernel: Lustre: 102474:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557136861/real 1557136861] req@ffff983601246600 x1632343431535872/t0(0) o106->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557136868 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 03:01:09 fir-md1-s1 kernel: Lustre: 102435:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982b81b13f00 x1632102084634304/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:14/0 lens 480/568 e 1 to 0 dl 1557136874 ref 2 fl Interpret:/0/0 rc 0/0 May 06 03:01:09 fir-md1-s1 kernel: Lustre: 102435:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 06 03:01:09 fir-md1-s1 kernel: Lustre: 102474:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 06 03:01:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 03:01:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.0.10.3@o2ib7) May 06 03:01:15 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 03:01:23 fir-md1-s1 kernel: Lustre: 102502:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557136876/real 1557136876] req@ffff9822c8aa3300 x1632343431535792/t0(0) o106->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557136883 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 03:01:23 fir-md1-s1 kernel: Lustre: 102502:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 06 03:01:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 03:01:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.0.10.3@o2ib7) May 06 03:01:44 fir-md1-s1 kernel: Lustre: 102502:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557136897/real 1557136897] req@ffff9822c8aa3300 x1632343431535792/t0(0) o106->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557136904 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 03:01:44 fir-md1-s1 kernel: Lustre: 102502:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 6 previous similar messages May 06 03:01:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 03:02:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 03:02:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.0.10.3@o2ib7) May 06 03:02:18 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 06 03:02:26 fir-md1-s1 kernel: Lustre: 102474:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557136939/real 1557136939] req@ffff983601246600 x1632343431535872/t0(0) o106->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557136946 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 03:02:26 fir-md1-s1 kernel: Lustre: 102474:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 10 previous similar messages May 06 03:02:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 03:02:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 6cfba528-87e9-efcd-a4bf-8720e8486959 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98371ab9dc00, cur 1557136970 expire 1557136820 last 1557136743 May 06 03:02:50 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 06 03:03:52 fir-md1-s1 kernel: LNetError: 101310:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 06 03:04:12 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 06 03:04:12 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 06 03:13:21 fir-md1-s1 kernel: Lustre: 101920:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557137594/real 1557137594] req@ffff9826fbac4e00 x1632343563198160/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557137601 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 06 03:13:21 fir-md1-s1 kernel: Lustre: 101920:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 6 previous similar messages May 06 03:13:28 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 3f6afc54-0ba8-2542-ecfe-dafbe684ac27 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9822bb51c800, cur 1557137608 expire 1557137458 last 1557137381 May 06 03:13:28 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 06 03:13:29 fir-md1-s1 kernel: Lustre: 102519:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982ce2be6900 x1632102103677344/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:4/0 lens 480/568 e 1 to 0 dl 1557137614 ref 2 fl Interpret:/0/0 rc 0/0 May 06 03:13:29 fir-md1-s1 kernel: Lustre: 102519:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 06 03:13:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 03:13:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 06 03:13:35 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 06 03:13:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 7183b88b-811e-f5b2-96a6-c6fe99d7288b (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984b9826f800, cur 1557137629 expire 1557137479 last 1557137402 May 06 03:18:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d6fba4ba-5f7d-7172-1bf9-db6eb412de93 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98259e159800, cur 1557137915 expire 1557137765 last 1557137688 May 06 03:18:35 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 06 03:19:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 06 03:19:05 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages May 06 03:22:39 fir-md1-s1 kernel: Lustre: 101913:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982003aa2d00 x1632102120722928/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:14/0 lens 480/568 e 1 to 0 dl 1557138164 ref 2 fl Interpret:/0/0 rc 0/0 May 06 03:22:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 03:22:55 fir-md1-s1 kernel: Lustre: 102498:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557138144/real 1557138144] req@ffff98275e260900 x1632343692286608/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557138175 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 06 03:22:55 fir-md1-s1 kernel: Lustre: 102498:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages May 06 03:23:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 03:23:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 03:24:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 7bc6242f-6717-4eef-3be9-f35c1f51d2fe (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9839e731d000, cur 1557138248 expire 1557138098 last 1557138021 May 06 03:24:08 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 03:25:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 3a103876-5e4f-5367-0db9-e25b704853af (at 10.8.26.4@o2ib6) in 225 seconds. I think it's dead, and I am evicting it. exp ffff982adafa2800, cur 1557138324 expire 1557138174 last 1557138099 May 06 03:25:24 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 03:25:25 fir-md1-s1 kernel: Lustre: 102475:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff982a25b2c800 x1632102124377648/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:0/0 lens 480/568 e 0 to 0 dl 1557138330 ref 2 fl Interpret:/0/0 rc 0/0 May 06 03:33:03 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client fa1763e1-d3e1-7734-ad19-cfa61e3b8bee (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982684c7d400, cur 1557138783 expire 1557138633 last 1557138556 May 06 03:33:03 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 03:33:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 06 03:33:54 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages May 06 03:34:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 343ba3e3-d70f-a3da-f13c-3f6a9bc1bebd (at 10.8.27.23@o2ib6) in 151 seconds. I think it's dead, and I am evicting it. exp ffff983a5e288800, cur 1557138859 expire 1557138709 last 1557138708 May 06 03:34:19 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 03:35:14 fir-md1-s1 kernel: Lustre: 102456:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff981ee5e69800 x1632102145679184/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:19/0 lens 480/568 e 1 to 0 dl 1557138919 ref 2 fl Interpret:/0/0 rc 0/0 May 06 03:35:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 03:35:20 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 06 03:35:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 343ba3e3-d70f-a3da-f13c-3f6a9bc1bebd (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984752efd000, cur 1557138929 expire 1557138779 last 1557138702 May 06 03:42:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 520bd10e-9f0d-8e63-4df7-8d30dee0285d (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983cbb409000, cur 1557139340 expire 1557139190 last 1557139113 May 06 03:42:20 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 06 03:45:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 06 03:45:30 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages May 06 03:51:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a9a926b0-41e3-b96d-29a7-ef486f00a747 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983fe6a6f000, cur 1557139910 expire 1557139760 last 1557139683 May 06 03:51:50 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 04:02:04 fir-md1-s1 kernel: Lustre: 102548:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557140517/real 1557140517] req@ffff98228ad9e900 x1632344244977696/t0(0) o106->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557140524 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 06 04:02:04 fir-md1-s1 kernel: Lustre: 102548:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 6 previous similar messages May 06 04:02:12 fir-md1-s1 kernel: Lustre: 102456:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982921a8ad00 x1632102189956992/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:17/0 lens 480/568 e 1 to 0 dl 1557140537 ref 2 fl Interpret:/0/0 rc 0/0 May 06 04:02:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 04:02:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 06 04:02:18 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 06 04:02:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 04:02:46 fir-md1-s1 kernel: Lustre: 102548:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557140559/real 1557140559] req@ffff98228ad9e900 x1632344244977696/t0(0) o106->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557140566 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 04:02:46 fir-md1-s1 kernel: Lustre: 102548:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 06 04:03:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 04:03:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 04:03:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 04:04:03 fir-md1-s1 kernel: Lustre: 102548:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557140636/real 1557140636] req@ffff98228ad9e900 x1632344244977696/t0(0) o106->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557140643 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 04:04:03 fir-md1-s1 kernel: Lustre: 102548:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 10 previous similar messages May 06 04:04:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 04:04:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 4ff17c06-109f-408d-69ea-950054b255fd (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983bc6a35400, cur 1557140656 expire 1557140506 last 1557140429 May 06 04:04:16 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 04:16:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f12a8c98-f4fd-747b-2ce7-336123fd5d99 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98416b2da000, cur 1557141377 expire 1557141227 last 1557141150 May 06 04:16:17 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 06 04:17:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 06 04:17:43 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages May 06 04:35:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 12182b47-d21c-6246-b9eb-d0d956754233 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9828ba78e400, cur 1557142524 expire 1557142374 last 1557142297 May 06 04:35:24 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 06 04:35:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 06 04:35:36 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 06 04:47:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client ce07cd06-fc7e-cd7e-5154-e77e8edfdcd9 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98383cab8000, cur 1557143253 expire 1557143103 last 1557143026 May 06 04:47:33 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages May 06 04:51:36 fir-md1-s1 kernel: LNetError: 101324:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 06 04:52:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 06 04:52:37 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages May 06 04:59:48 fir-md1-s1 kernel: Lustre: 102438:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557143981/real 1557143981] req@ffff981f0037da00 x1632345039951648/t0(0) o106->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557143988 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 06 04:59:48 fir-md1-s1 kernel: Lustre: 102438:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 06 04:59:56 fir-md1-s1 kernel: Lustre: 102394:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff985cd85aaa00 x1632102286169632/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:1/0 lens 480/568 e 1 to 0 dl 1557144001 ref 2 fl Interpret:/0/0 rc 0/0 May 06 05:00:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 05:00:09 fir-md1-s1 kernel: Lustre: 102438:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557144002/real 1557144002] req@ffff981f0037da00 x1632345039951648/t0(0) o106->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557144009 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 05:00:09 fir-md1-s1 kernel: Lustre: 102438:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 06 05:00:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 05:00:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 05:00:51 fir-md1-s1 kernel: Lustre: 102438:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557144044/real 1557144044] req@ffff981f0037da00 x1632345039951648/t0(0) o106->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557144051 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 05:00:51 fir-md1-s1 kernel: Lustre: 102438:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 06 05:01:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 05:01:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f9e9b045-6235-217c-c69f-23d78fca9da8 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9837c96d8c00, cur 1557144074 expire 1557143924 last 1557143847 May 06 05:01:14 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 05:01:28 fir-md1-s1 kernel: LNetError: 101320:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 06 05:03:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 06 05:03:15 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages May 06 05:06:29 fir-md1-s1 kernel: Lustre: 102537:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557144382/real 1557144382] req@ffff982886225a00 x1632345140721600/t0(0) o106->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557144389 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 06 05:06:29 fir-md1-s1 kernel: Lustre: 102537:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages May 06 05:06:47 fir-md1-s1 kernel: Lustre: 102370:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff9852c578dd00 x1632102293361664/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:22/0 lens 480/568 e 0 to 0 dl 1557144412 ref 2 fl Interpret:/0/0 rc 0/0 May 06 05:06:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 05:07:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 05:07:56 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 06 05:09:03 fir-md1-s1 kernel: Lustre: 102537:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557144536/real 1557144536] req@ffff982886225a00 x1632345140721600/t0(0) o106->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557144543 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 05:09:03 fir-md1-s1 kernel: Lustre: 102537:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 21 previous similar messages May 06 05:13:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 16ccca64-89bf-afcb-7d02-bc38a437f431 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983808639000, cur 1557144838 expire 1557144688 last 1557144611 May 06 05:13:58 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages May 06 05:14:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 06 05:14:03 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages May 06 05:24:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 6a4419ac-54d1-b937-b5a1-a7ec9f100330 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98360ea71c00, cur 1557145447 expire 1557145297 last 1557145220 May 06 05:24:07 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 05:29:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 06 05:29:25 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 06 05:37:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e0482b95-aeb3-d11f-cade-4a17ed2e2a52 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983f0f60c800, cur 1557146236 expire 1557146086 last 1557146009 May 06 05:37:16 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages May 06 05:42:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 06 05:42:19 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages May 06 05:51:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f7655cbe-45b1-ffd6-075f-41c251406fe1 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9839b2ea9000, cur 1557147070 expire 1557146920 last 1557146843 May 06 05:51:10 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages May 06 06:00:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 06 06:00:29 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages May 06 06:08:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 706384ea-f41b-ff57-ed02-af4b44495488 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98384ba18400, cur 1557148094 expire 1557147944 last 1557147867 May 06 06:08:14 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages May 06 06:16:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 06 06:16:25 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages May 06 06:22:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 8d53a37e-b13e-696a-792a-63f4340009d8 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98200cb8dc00, cur 1557148940 expire 1557148790 last 1557148713 May 06 06:22:20 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 06 06:27:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 06 06:27:26 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 06 06:34:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 47e215b0-c979-bff7-f1bf-a67b2088f7a4 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9829eb4e7000, cur 1557149699 expire 1557149549 last 1557149472 May 06 06:34:59 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 06 06:42:16 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 06 06:42:16 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 06 07:01:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client c2a984b8-884a-1562-e36e-5bea7c087c2e (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff981fb4be6400, cur 1557151317 expire 1557151167 last 1557151090 May 06 07:01:57 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 06 07:02:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 06 07:02:46 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 07:08:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 97e06487-815b-ad03-ae76-bb0bf9a66e68 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984bd6709000, cur 1557151721 expire 1557151571 last 1557151494 May 06 07:08:41 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 07:09:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 06 07:09:26 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 07:13:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 5c2667ef-405b-08b1-d8c7-f2b8cfd105f2 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984a11b24400, cur 1557152005 expire 1557151855 last 1557151778 May 06 07:13:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 07:13:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 06 07:13:57 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 07:19:49 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 06 07:19:49 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 07:19:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client db5a2821-4ddb-74b9-4a39-2df36e495426 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984b611e5c00, cur 1557152392 expire 1557152242 last 1557152165 May 06 07:19:52 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 07:25:19 fir-md1-s1 kernel: LNetError: 101317:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 06 07:34:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 327b978a-137f-70bf-43b8-6f356295a322 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9821973ff800, cur 1557153257 expire 1557153107 last 1557153030 May 06 07:34:17 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 06 07:35:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 06 07:35:03 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 06 07:48:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 60d06d9f-490e-bb13-4e7a-db59a5f7fc49 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff981fd4e13000, cur 1557154134 expire 1557153984 last 1557153907 May 06 07:48:54 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 06 07:49:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 06 07:49:07 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 06 08:04:00 fir-md1-s1 kernel: Lustre: 102667:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557155033/real 1557155033] req@ffff982847524800 x1632347683430816/t0(0) o106->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557155040 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 06 08:04:08 fir-md1-s1 kernel: Lustre: 101920:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9823f837bf00 x1632102602232704/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:13/0 lens 480/568 e 1 to 0 dl 1557155053 ref 2 fl Interpret:/0/0 rc 0/0 May 06 08:04:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 08:04:14 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 08:04:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 06 08:04:14 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages May 06 08:04:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 08:04:42 fir-md1-s1 kernel: Lustre: 102667:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557155075/real 1557155075] req@ffff982847524800 x1632347683430816/t0(0) o106->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557155082 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 08:04:42 fir-md1-s1 kernel: Lustre: 102667:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 06 08:04:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 08:05:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client fdef6866-4d02-f296-44b6-04d50d09abe2 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9848896b0400, cur 1557155120 expire 1557154970 last 1557154893 May 06 08:05:20 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages May 06 08:15:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 93b9084b-1c2e-8cf4-5955-85e835bcdcb2 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9857929fdc00, cur 1557155744 expire 1557155594 last 1557155517 May 06 08:15:44 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 06 08:16:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 93524e03-763b-9556-15d0-9c57c97e51dd (at 10.8.21.21@o2ib6) May 06 08:16:11 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages May 06 08:23:30 fir-md1-s1 kernel: Lustre: 102619:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557156202/real 1557156202] req@ffff982b04b53900 x1632347977213680/t0(0) o106->fir-MDT0002@10.8.21.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557156209 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 06 08:23:30 fir-md1-s1 kernel: Lustre: 102619:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 06 08:23:37 fir-md1-s1 kernel: Lustre: 102655:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982ad5342400 x1632102633437424/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:12/0 lens 480/568 e 1 to 0 dl 1557156222 ref 2 fl Interpret:/0/0 rc 0/0 May 06 08:23:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 08:23:43 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 06 08:23:44 fir-md1-s1 kernel: Lustre: 102619:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557156217/real 1557156217] req@ffff982b04b53900 x1632347977213680/t0(0) o106->fir-MDT0002@10.8.21.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557156224 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 08:23:44 fir-md1-s1 kernel: Lustre: 102619:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 06 08:24:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 08:24:05 fir-md1-s1 kernel: Lustre: 102619:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557156238/real 1557156238] req@ffff982b04b53900 x1632347977213680/t0(0) o106->fir-MDT0002@10.8.21.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557156245 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 08:24:05 fir-md1-s1 kernel: Lustre: 102619:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 06 08:24:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 08:24:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 08:24:47 fir-md1-s1 kernel: Lustre: 102619:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557156280/real 1557156280] req@ffff982b04b53900 x1632347977213680/t0(0) o106->fir-MDT0002@10.8.21.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557156287 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 08:24:47 fir-md1-s1 kernel: Lustre: 102619:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 06 08:25:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 08:25:28 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 06 08:25:50 fir-md1-s1 kernel: LustreError: 102619:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.21.21@o2ib6) returned error from glimpse AST (req@ffff982b04b53900 x1632347977213680 status -107 rc -107), evict it ns: mdt-fir-MDT0002_UUID lock: ffff983e5b77f500/0xce8853ade9c22983 lrc: 4/0,0 mode: PW/PW res: [0x2c0024079:0x3:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x40200000000000 nid: 10.8.21.21@o2ib6 remote: 0xaf819789a4040101 expref: 45 pid: 102430 timeout: 0 lvb_type: 0 May 06 08:25:50 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.21.21@o2ib6 was evicted due to a lock glimpse callback time out: rc -107 May 06 08:25:50 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 438s: evicting client at 10.8.21.21@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff983e5b77f500/0xce8853ade9c22983 lrc: 4/0,0 mode: PW/PW res: [0x2c0024079:0x3:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x40200000000000 nid: 10.8.21.21@o2ib6 remote: 0xaf819789a4040101 expref: 46 pid: 102430 timeout: 0 lvb_type: 0 May 06 08:26:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f6a1d91d-272c-37e3-2e1b-5d6077216791 (at 10.8.26.4@o2ib6) in 211 seconds. I think it's dead, and I am evicting it. exp ffff982bfaf01400, cur 1557156402 expire 1557156252 last 1557156191 May 06 08:26:42 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 06 08:27:32 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 06 08:27:32 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages May 06 08:33:25 fir-md1-s1 kernel: Lustre: 102764:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557156798/real 1557156798] req@ffff98209ae8bc00 x1632348128501888/t0(0) o106->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557156805 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 06 08:33:25 fir-md1-s1 kernel: Lustre: 102764:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages May 06 08:33:54 fir-md1-s1 kernel: Lustre: 102431:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff9820e8727800 x1632102647858592/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:29/0 lens 480/568 e 0 to 0 dl 1557156839 ref 2 fl Interpret:/0/0 rc 0/0 May 06 08:34:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 08:34:00 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 06 08:35:56 fir-md1-s1 kernel: Lustre: 102731:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557156949/real 1557156949] req@ffff98209ae8bc00 x1632348131307824/t0(0) o106->fir-MDT0002@10.8.21.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557156956 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 08:35:56 fir-md1-s1 kernel: Lustre: 102731:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 20 previous similar messages May 06 08:36:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 08:36:36 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages May 06 08:43:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a319dd98-c842-c46e-24f6-b29aa32319a9 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983ca148d400, cur 1557157416 expire 1557157266 last 1557157189 May 06 08:43:36 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages May 06 08:43:39 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 06 08:43:39 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages May 06 08:44:28 fir-md1-s1 kernel: Lustre: 101920:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff982847524500 x1632102663144176/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:3/0 lens 480/568 e 0 to 0 dl 1557157473 ref 2 fl Interpret:/0/0 rc 0/0 May 06 08:44:34 fir-md1-s1 kernel: Lustre: 102568:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557157443/real 1557157443] req@ffff982020f28f00 x1632348291023552/t0(0) o106->fir-MDT0002@10.8.21.21@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557157474 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 06 08:44:34 fir-md1-s1 kernel: Lustre: 102568:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 06 08:44:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 08:48:28 fir-md1-s1 kernel: Lustre: 102383:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff985b09383f00 x1631715910163312/t0(0) o101->9017b2fd-d1de-a8da-328e-8aeae87aa675@10.9.102.60@o2ib4:3/0 lens 592/3264 e 1 to 0 dl 1557157713 ref 2 fl Interpret:/0/0 rc 0/0 May 06 08:48:29 fir-md1-s1 kernel: Lustre: 102991:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff98531a6e0f00 x1631564548361504/t0(0) o101->d4b9204a-0e97-30e6-4a34-9148370f1203@10.9.102.64@o2ib4:4/0 lens 592/3264 e 1 to 0 dl 1557157714 ref 2 fl Interpret:/0/0 rc 0/0 May 06 08:48:29 fir-md1-s1 kernel: Lustre: 102991:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 46 previous similar messages May 06 08:48:31 fir-md1-s1 kernel: Lustre: 103229:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff98581ea5dd00 x1631604729059088/t0(0) o101->4708089c-6248-fcdb-7f37-38d59f3fc6d7@10.9.102.62@o2ib4:6/0 lens 592/3264 e 1 to 0 dl 1557157716 ref 2 fl Interpret:/0/0 rc 0/0 May 06 08:48:31 fir-md1-s1 kernel: Lustre: 103229:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 43 previous similar messages May 06 08:48:35 fir-md1-s1 kernel: Lustre: 102522:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982ee1ace900 x1631744023880256/t0(0) o101->aed2d6e2-3f34-1807-4b1c-5833e0ac7776@10.8.21.25@o2ib6:10/0 lens 592/3264 e 1 to 0 dl 1557157720 ref 2 fl Interpret:/0/0 rc 0/0 May 06 08:48:35 fir-md1-s1 kernel: Lustre: 102522:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 33 previous similar messages May 06 08:48:43 fir-md1-s1 kernel: Lustre: 102666:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff982087b3a400 x1631707114667488/t0(0) o101->d45680cd-2f39-0507-b003-5a8dae1e864b@10.8.11.33@o2ib6:18/0 lens 592/3264 e 0 to 0 dl 1557157728 ref 2 fl Interpret:/0/0 rc 0/0 May 06 08:48:43 fir-md1-s1 kernel: Lustre: 102666:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 68 previous similar messages May 06 08:49:00 fir-md1-s1 kernel: Lustre: 102420:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff981d25b96f00 x1632741099384144/t0(0) o101->d7e49c9f-8146-7fe5-a38a-a362e877fd64@10.8.11.9@o2ib6:5/0 lens 592/3264 e 0 to 0 dl 1557157745 ref 2 fl Interpret:/0/0 rc 0/0 May 06 08:49:00 fir-md1-s1 kernel: Lustre: 102420:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 32 previous similar messages May 06 08:59:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 072fbe89-de73-a628-eacf-0142936a2a9d (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9848ae293800, cur 1557158397 expire 1557158247 last 1557158170 May 06 08:59:57 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages May 06 09:02:41 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 06 09:02:41 fir-md1-s1 kernel: Lustre: Skipped 640 previous similar messages May 06 09:17:27 fir-md1-s1 kernel: Lustre: 102712:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557159440/real 1557159440] req@ffff98488bbab300 x1632348792707392/t0(0) o104->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557159447 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 06 09:17:27 fir-md1-s1 kernel: Lustre: 102712:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 16 previous similar messages May 06 09:17:35 fir-md1-s1 kernel: Lustre: 102511:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff983b3fab0c00 x1631546416846208/t0(0) o101->c5d29146-8e69-99bb-85ae-0e928604facc@10.8.0.68@o2ib6:10/0 lens 1784/3288 e 1 to 0 dl 1557159460 ref 2 fl Interpret:/0/0 rc 0/0 May 06 09:17:35 fir-md1-s1 kernel: Lustre: 102511:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 16 previous similar messages May 06 09:17:39 fir-md1-s1 kernel: Lustre: 102463:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff98329c1d4800 x1631550166639680/t0(0) o101->6137bba0-34c0-9107-d068-27095ef10964@10.8.22.23@o2ib6:14/0 lens 576/3264 e 1 to 0 dl 1557159464 ref 2 fl Interpret:/0/0 rc 0/0 May 06 09:17:39 fir-md1-s1 kernel: Lustre: 102463:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 37 previous similar messages May 06 09:17:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client c5d29146-8e69-99bb-85ae-0e928604facc (at 10.8.0.68@o2ib6) reconnecting May 06 09:17:41 fir-md1-s1 kernel: Lustre: Skipped 631 previous similar messages May 06 09:17:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to e89dcc01-5b37-54c0-22e3-05cfd841e08e (at 10.8.0.68@o2ib6) May 06 09:17:41 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 06 09:17:47 fir-md1-s1 kernel: Lustre: 102748:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff98392e6af500 x1631806653874560/t0(0) o101->b633168b-f1eb-e25c-6763-0f69d52a1467@10.8.21.10@o2ib6:22/0 lens 576/3264 e 1 to 0 dl 1557159472 ref 2 fl Interpret:/0/0 rc 0/0 May 06 09:17:47 fir-md1-s1 kernel: Lustre: 102748:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 10 previous similar messages May 06 09:18:03 fir-md1-s1 kernel: Lustre: 102480:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff984811a7da00 x1631769542263904/t0(0) o101->49f5ef1e-74d3-5241-0e16-08e2b91b84c7@10.8.20.23@o2ib6:8/0 lens 576/3264 e 0 to 0 dl 1557159488 ref 2 fl Interpret:/0/0 rc 0/0 May 06 09:18:03 fir-md1-s1 kernel: Lustre: 102480:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages May 06 09:18:44 fir-md1-s1 kernel: Lustre: 102712:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557159517/real 1557159517] req@ffff98488bbab300 x1632348792707392/t0(0) o104->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557159524 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 09:18:44 fir-md1-s1 kernel: Lustre: 102712:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 10 previous similar messages May 06 09:18:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 22c80c04-d88b-de12-67d6-cc493d448d56 (at 10.8.25.20@o2ib6) reconnecting May 06 09:18:45 fir-md1-s1 kernel: Lustre: Skipped 186 previous similar messages May 06 09:18:50 fir-md1-s1 kernel: LustreError: 102662:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557159440, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff982c895e4ec0/0xce8853ae75b104ad lrc: 3/1,0 mode: --/PR res: [0x2c001c0e4:0xe39a:0x0].0x0 bits 0x13/0x0 rrc: 61 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 102662 timeout: 0 lvb_type: 0 May 06 09:18:50 fir-md1-s1 kernel: LustreError: 102662:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 3 previous similar messages May 06 09:18:52 fir-md1-s1 kernel: LustreError: 101687:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557159442, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9837316ae0c0/0xce8853ae75c5b680 lrc: 3/1,0 mode: --/PR res: [0x2c001c0e4:0xe39a:0x0].0x0 bits 0x13/0x0 rrc: 61 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 101687 timeout: 0 lvb_type: 0 May 06 09:18:52 fir-md1-s1 kernel: LustreError: 101687:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 29 previous similar messages May 06 09:18:57 fir-md1-s1 kernel: LustreError: 102725:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557159447, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff984916b05a00/0xce8853ae75f74d69 lrc: 3/1,0 mode: --/PR res: [0x2c001c0e4:0xe39a:0x0].0x0 bits 0x13/0x0 rrc: 61 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 102725 timeout: 0 lvb_type: 0 May 06 09:18:57 fir-md1-s1 kernel: LustreError: 102725:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 10 previous similar messages May 06 09:19:08 fir-md1-s1 kernel: LustreError: 102664:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557159458, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff982b5117f980/0xce8853ae765ccf61 lrc: 3/1,0 mode: --/PR res: [0x2c001c0e4:0xe39a:0x0].0x0 bits 0x13/0x0 rrc: 61 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 102664 timeout: 0 lvb_type: 0 May 06 09:19:08 fir-md1-s1 kernel: LustreError: 102664:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 4 previous similar messages May 06 09:19:24 fir-md1-s1 kernel: LustreError: 102403:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557159474, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff983af4306780/0xce8853ae76fdb29d lrc: 3/1,0 mode: --/PR res: [0x2c001c0e4:0xe39a:0x0].0x0 bits 0x13/0x0 rrc: 61 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 102403 timeout: 0 lvb_type: 0 May 06 09:19:24 fir-md1-s1 kernel: LustreError: 102403:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 5 previous similar messages May 06 09:19:54 fir-md1-s1 kernel: LustreError: 102712:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.26.4@o2ib6) failed to reply to blocking AST (req@ffff98488bbab300 x1632348792707392 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff98437b7b69c0/0xce8853ae718bd1d9 lrc: 4/0,0 mode: PR/PR res: [0x2c001c0e4:0xe39a:0x0].0x0 bits 0x13/0x0 rrc: 61 type: IBT flags: 0x60200400000020 nid: 10.8.26.4@o2ib6 remote: 0xb9dd517cab2f1891 expref: 41 pid: 102526 timeout: 685008 lvb_type: 0 May 06 09:19:54 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.26.4@o2ib6 was evicted due to a lock blocking callback time out: rc -110 May 06 09:19:54 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.26.4@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff98437b7b69c0/0xce8853ae718bd1d9 lrc: 3/0,0 mode: PR/PR res: [0x2c001c0e4:0xe39a:0x0].0x0 bits 0x13/0x0 rrc: 61 type: IBT flags: 0x60200400000020 nid: 10.8.26.4@o2ib6 remote: 0xb9dd517cab2f1891 expref: 42 pid: 102526 timeout: 0 lvb_type: 0 May 06 09:20:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b5268e10-4518-b666-695a-5d9cae456d77 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98318eb7e800, cur 1557159601 expire 1557159451 last 1557159374 May 06 09:20:01 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 06 09:30:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d7e49c9f-8146-7fe5-a38a-a362e877fd64 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9836d2b70400, cur 1557160231 expire 1557160081 last 1557160004 May 06 09:30:31 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 06 09:32:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.11.9@o2ib6) May 06 09:32:15 fir-md1-s1 kernel: Lustre: Skipped 382 previous similar messages May 06 09:49:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 06 09:49:17 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 09:56:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 7e2298e1-48e8-e680-96bb-e15d97d99a0e (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984171e12400, cur 1557161813 expire 1557161663 last 1557161586 May 06 09:56:53 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 06 10:01:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 06 10:01:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 10:09:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 469a77a9-f16a-f1b8-e132-3f330ca8fc3c (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98357d7cc400, cur 1557162591 expire 1557162441 last 1557162364 May 06 10:09:51 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 10:13:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 06 10:13:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 10:16:18 fir-md1-s1 kernel: Lustre: 102388:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 May 06 10:16:18 fir-md1-s1 kernel: Lustre: 102388:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 29356 previous similar messages May 06 10:20:08 fir-md1-s1 kernel: Lustre: 102477:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557163201/real 1557163201] req@ffff9828a4e53900 x1632349728836192/t0(0) o106->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557163208 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 06 10:20:08 fir-md1-s1 kernel: Lustre: 102477:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 10 previous similar messages May 06 10:20:26 fir-md1-s1 kernel: Lustre: 102619:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff982bd236fb00 x1632102903287280/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:1/0 lens 480/568 e 0 to 0 dl 1557163231 ref 2 fl Interpret:/0/0 rc 0/0 May 06 10:20:26 fir-md1-s1 kernel: Lustre: 102619:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages May 06 10:20:29 fir-md1-s1 kernel: Lustre: 102477:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557163222/real 1557163222] req@ffff9828a4e53900 x1632349728836192/t0(0) o106->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557163229 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 10:20:29 fir-md1-s1 kernel: Lustre: 102477:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 06 10:20:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 10:20:32 fir-md1-s1 kernel: Lustre: Skipped 192 previous similar messages May 06 10:21:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 10:21:11 fir-md1-s1 kernel: Lustre: 102477:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557163264/real 1557163264] req@ffff9828a4e53900 x1632349728836192/t0(0) o106->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557163271 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 10:21:11 fir-md1-s1 kernel: Lustre: 102477:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 06 10:22:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 10:22:05 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 06 10:22:28 fir-md1-s1 kernel: Lustre: 102477:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557163341/real 1557163341] req@ffff9828a4e53900 x1632349728836192/t0(0) o106->fir-MDT0002@10.8.26.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557163348 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 10:22:28 fir-md1-s1 kernel: Lustre: 102477:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 10 previous similar messages May 06 10:22:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 26a94d75-def1-f812-a8fb-3d07b8b33fa6 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983cb166cc00, cur 1557163361 expire 1557163211 last 1557163134 May 06 10:22:41 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 10:29:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client bf7ddf25-2c21-97b7-fd16-0978818b941b (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982ea8711c00, cur 1557163748 expire 1557163598 last 1557163521 May 06 10:29:08 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 10:30:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 06 10:30:00 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages May 06 10:32:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 333a536a-25d4-142d-1889-07167f2dcdc6 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984cbb541000, cur 1557163926 expire 1557163776 last 1557163699 May 06 10:32:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 10:33:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client fc4a73cd-b94b-5573-7947-7e92e3555159 (at 10.8.26.4@o2ib6) in 202 seconds. I think it's dead, and I am evicting it. exp ffff98314da3a000, cur 1557164002 expire 1557163852 last 1557163800 May 06 10:33:22 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 10:36:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client fa94fe19-9187-6fef-639a-a6f5f2a6424e (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9839b7a8f800, cur 1557164183 expire 1557164033 last 1557163956 May 06 10:36:23 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 10:36:52 fir-md1-s1 kernel: LNetError: 101315:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 06 10:45:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 9223f222-77bc-063e-4d5b-9e61a153e868 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9837dee7e800, cur 1557164741 expire 1557164591 last 1557164514 May 06 10:45:41 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 10:52:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 06 10:52:59 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages May 06 10:53:33 fir-md1-s1 kernel: Lustre: 102744:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557165206/real 1557165206] req@ffff9823db250c00 x1632350282968512/t0(0) o106->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557165213 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 06 10:53:33 fir-md1-s1 kernel: Lustre: 102744:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 06 10:53:41 fir-md1-s1 kernel: Lustre: 102488:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9827def86300 x1632102987507648/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:16/0 lens 480/568 e 1 to 0 dl 1557165226 ref 2 fl Interpret:/0/0 rc 0/0 May 06 10:53:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 10:53:48 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 06 10:53:54 fir-md1-s1 kernel: Lustre: 102744:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557165227/real 1557165227] req@ffff9823db250c00 x1632350282968512/t0(0) o106->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557165234 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 10:53:54 fir-md1-s1 kernel: Lustre: 102744:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 06 10:54:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 10:54:18 fir-md1-s1 kernel: Lustre: 102735:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff9852a4774500 x1631544162444992/t0(0) o101->6576db86-576c-58c4-907d-b54174076c6b@10.9.104.9@o2ib4:23/0 lens 480/568 e 0 to 0 dl 1557165263 ref 2 fl Interpret:/0/0 rc 0/0 May 06 10:54:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 3bbb5d72-ae4e-c45c-50f5-493f69eb9321 (at 10.9.104.9@o2ib4) May 06 10:54:24 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages May 06 10:54:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 10:54:30 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 06 10:54:35 fir-md1-s1 kernel: Lustre: 102452:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557165268/real 1557165268] req@ffff9839482f0300 x1632350291469904/t0(0) o104->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557165275 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 10:54:35 fir-md1-s1 kernel: Lustre: 102452:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 10 previous similar messages May 06 10:55:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 10:55:14 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 10:55:52 fir-md1-s1 kernel: Lustre: 102452:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557165345/real 1557165345] req@ffff9839482f0300 x1632350291469904/t0(0) o104->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557165352 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 10:55:52 fir-md1-s1 kernel: Lustre: 102452:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 21 previous similar messages May 06 10:56:27 fir-md1-s1 kernel: LustreError: 102452:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.27.23@o2ib6) failed to reply to blocking AST (req@ffff9839482f0300 x1632350291469904 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff982a3e74f080/0xce8853af51345417 lrc: 6/0,0 mode: PW/PW res: [0x20002186c:0x9:0x0].0x0 bits 0x40/0x0 rrc: 12 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0xe937331d332a6988 expref: 22 pid: 102713 timeout: 690801 lvb_type: 0 May 06 10:56:27 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.8.27.23@o2ib6 was evicted due to a lock blocking callback time out: rc -110 May 06 10:56:27 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.27.23@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff982a3e74f080/0xce8853af51345417 lrc: 5/0,0 mode: PW/PW res: [0x20002186c:0x9:0x0].0x0 bits 0x40/0x0 rrc: 12 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0xe937331d332a6988 expref: 23 pid: 102713 timeout: 0 lvb_type: 0 May 06 10:56:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client b5850518-2b56-b7a4-1a2a-ce10c7c061ef (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9845fbaad800, cur 1557165396 expire 1557165246 last 1557165169 May 06 10:56:36 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 10:56:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 06 10:56:54 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages May 06 11:02:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 21995ffb-bda9-2f51-f494-926f5bb5a4ab (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984a4df95c00, cur 1557165760 expire 1557165610 last 1557165533 May 06 11:02:40 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 06 11:05:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 06 11:05:28 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 11:10:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 1235fdf1-ea0b-2ff3-7b7a-5905efcd76aa (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983582e74000, cur 1557166241 expire 1557166091 last 1557166014 May 06 11:10:41 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 11:23:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 06 11:23:47 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 06 11:24:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 4ce1a997-ba24-0e18-3bc8-a3a73955800e (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9834b12b5c00, cur 1557167084 expire 1557166934 last 1557166857 May 06 11:24:44 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 11:27:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 1b6db515-0e2a-5fd1-64db-d21172cdc45d (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98314bf74800, cur 1557167266 expire 1557167116 last 1557167039 May 06 11:27:46 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 11:28:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 1b6db515-0e2a-5fd1-64db-d21172cdc45d (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984a0fe20800, cur 1557167282 expire 1557167132 last 1557167055 May 06 11:32:03 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 2c25c505-2796-b3fb-36ac-b8e5c19b79fb (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9823b9b7f400, cur 1557167523 expire 1557167373 last 1557167296 May 06 11:32:03 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 06 11:36:31 fir-md1-s1 kernel: Lustre: 102653:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557167784/real 1557167784] req@ffff98465a2c0f00 x1632351073839056/t0(0) o104->fir-MDT0002@10.9.114.5@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1557167791 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 06 11:36:31 fir-md1-s1 kernel: Lustre: 102653:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 10 previous similar messages May 06 11:36:39 fir-md1-s1 kernel: Lustre: 102649:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982b5ba99200 x1631530253331632/t0(0) o101->0b49eccd-cda4-7bac-8560-4f28415786a3@10.9.0.62@o2ib4:14/0 lens 1784/3288 e 1 to 0 dl 1557167804 ref 2 fl Interpret:/0/0 rc 0/0 May 06 11:36:40 fir-md1-s1 kernel: Lustre: 102405:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff984c35f0ce00 x1631559393386976/t0(0) o101->dacb83f0-b432-ea21-cf1b-fb1ac63fd0b0@10.9.101.62@o2ib4:15/0 lens 592/3264 e 1 to 0 dl 1557167805 ref 2 fl Interpret:/0/0 rc 0/0 May 06 11:36:40 fir-md1-s1 kernel: Lustre: 102405:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 7 previous similar messages May 06 11:36:41 fir-md1-s1 kernel: Lustre: 101911:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff984b9f6f6600 x1631815396192864/t0(0) o101->6da928ad-923b-cec3-5920-76a1fc1b7ec3@10.9.107.30@o2ib4:16/0 lens 592/3264 e 1 to 0 dl 1557167806 ref 2 fl Interpret:/0/0 rc 0/0 May 06 11:36:41 fir-md1-s1 kernel: Lustre: 101911:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 5 previous similar messages May 06 11:36:43 fir-md1-s1 kernel: Lustre: 102464:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff983dc87f2100 x1631588077763136/t0(0) o101->d4ebf645-312a-a45c-d2fb-9a1e61693fc2@10.9.107.16@o2ib4:18/0 lens 592/3264 e 1 to 0 dl 1557167808 ref 2 fl Interpret:/0/0 rc 0/0 May 06 11:36:43 fir-md1-s1 kernel: Lustre: 102464:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 12 previous similar messages May 06 11:36:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 0b49eccd-cda4-7bac-8560-4f28415786a3 (at 10.9.0.62@o2ib4) reconnecting May 06 11:36:45 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 06 11:36:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.0.62@o2ib4) May 06 11:36:45 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages May 06 11:36:47 fir-md1-s1 kernel: Lustre: 102575:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9846e5f7f500 x1631537116120656/t0(0) o101->2c2c4e2a-8072-33c3-de28-eb5583c5c142@10.9.105.52@o2ib4:22/0 lens 592/3264 e 1 to 0 dl 1557167812 ref 2 fl Interpret:/0/0 rc 0/0 May 06 11:36:47 fir-md1-s1 kernel: Lustre: 102575:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 14 previous similar messages May 06 11:36:52 fir-md1-s1 kernel: Lustre: 102653:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557167805/real 1557167805] req@ffff98465a2c0f00 x1632351073839056/t0(0) o104->fir-MDT0002@10.9.114.5@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1557167812 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 11:36:52 fir-md1-s1 kernel: Lustre: 102653:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 06 11:36:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 8776df39-692a-3df9-0874-72e441440742 (at 10.8.18.22@o2ib6) reconnecting May 06 11:36:53 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages May 06 11:36:55 fir-md1-s1 kernel: Lustre: 102431:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff98220e621e00 x1631535791099440/t0(0) o101->6ea10cfc-48e7-f6e7-b834-4eb6674e3061@10.9.102.48@o2ib4:0/0 lens 592/3264 e 1 to 0 dl 1557167820 ref 2 fl Interpret:/0/0 rc 0/0 May 06 11:36:55 fir-md1-s1 kernel: Lustre: 102431:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 30 previous similar messages May 06 11:37:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 61a62562-ad73-dc2c-c4dd-497b185dd54e (at 10.8.16.5@o2ib6) reconnecting May 06 11:37:09 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages May 06 11:37:13 fir-md1-s1 kernel: Lustre: 102539:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9843bd755d00 x1631559187969312/t0(0) o101->c1d9f0f7-d490-e556-ed11-756e6b122018@10.9.104.22@o2ib4:18/0 lens 592/3264 e 1 to 0 dl 1557167838 ref 2 fl Interpret:/0/0 rc 0/0 May 06 11:37:13 fir-md1-s1 kernel: Lustre: 102539:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 38 previous similar messages May 06 11:37:34 fir-md1-s1 kernel: Lustre: 102653:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557167847/real 1557167847] req@ffff98465a2c0f00 x1632351073839056/t0(0) o104->fir-MDT0002@10.9.114.5@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1557167854 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 11:37:34 fir-md1-s1 kernel: Lustre: 102653:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 06 11:37:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client f50d7e88-e046-b55a-0fe3-13cbe95d417b (at 10.8.8.33@o2ib6) reconnecting May 06 11:37:41 fir-md1-s1 kernel: Lustre: Skipped 208 previous similar messages May 06 11:37:46 fir-md1-s1 kernel: Lustre: 102382:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff984425451b00 x1631683603563328/t0(0) o101->6dc651d0-2b7a-dd35-f234-bffd4712bc50@10.8.30.23@o2ib6:21/0 lens 592/3264 e 0 to 0 dl 1557167871 ref 2 fl Interpret:/0/0 rc 0/0 May 06 11:37:46 fir-md1-s1 kernel: Lustre: 102382:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 54 previous similar messages May 06 11:37:54 fir-md1-s1 kernel: LustreError: 102692:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557167784, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9845ee972640/0xce8853afc45eba74 lrc: 3/1,0 mode: --/PR res: [0x2c001306c:0xa917:0x0].0x0 bits 0x13/0x0 rrc: 234 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 102692 timeout: 0 lvb_type: 0 May 06 11:37:54 fir-md1-s1 kernel: LustreError: 102692:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 3 previous similar messages May 06 11:37:59 fir-md1-s1 kernel: LustreError: 102554:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557167788, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff983aab781680/0xce8853afc48ac402 lrc: 3/1,0 mode: --/PR res: [0x2c001306c:0xa917:0x0].0x0 bits 0x13/0x0 rrc: 245 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 102554 timeout: 0 lvb_type: 0 May 06 11:37:59 fir-md1-s1 kernel: LustreError: 102554:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 30 previous similar messages May 06 11:38:07 fir-md1-s1 kernel: LustreError: 102517:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557167797, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff98225531de80/0xce8853afc4db289c lrc: 3/1,0 mode: --/PR res: [0x2c001306c:0xa917:0x0].0x0 bits 0x13/0x0 rrc: 259 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 102517 timeout: 0 lvb_type: 0 May 06 11:38:07 fir-md1-s1 kernel: LustreError: 102517:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 31 previous similar messages May 06 11:38:23 fir-md1-s1 kernel: LustreError: 102383:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557167813, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff982cb801da00/0xce8853afc57df832 lrc: 3/1,0 mode: --/PR res: [0x2c001306c:0xa917:0x0].0x0 bits 0x13/0x0 rrc: 299 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 102383 timeout: 0 lvb_type: 0 May 06 11:38:23 fir-md1-s1 kernel: LustreError: 102383:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 47 previous similar messages May 06 11:38:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 7292487d-4c3f-beaa-f7a3-4a39edecde1a (at 10.9.114.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9857c7631c00, cur 1557167921 expire 1557167771 last 1557167694 May 06 11:38:41 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 11:38:41 fir-md1-s1 kernel: Lustre: 102529:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (61:1s); client may timeout. req@ffff98386eb31800 x1631702568376160/t0(0) o101->53614e9e-2bda-c4b5-5e14-2f81de822c19@10.8.20.32@o2ib6:9/0 lens 592/536 e 0 to 0 dl 1557167920 ref 1 fl Complete:/0/0 rc 0/0 May 06 11:38:41 fir-md1-s1 kernel: Lustre: 102529:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 12 previous similar messages May 06 11:39:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 07898349-5816-6d13-be2d-72e7c3d545c0 (at 10.9.107.12@o2ib4) reconnecting May 06 11:39:09 fir-md1-s1 kernel: Lustre: Skipped 551 previous similar messages May 06 11:45:54 fir-md1-s1 kernel: Lustre: 102572:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557168347/real 1557168347] req@ffff982207331500 x1632351256728960/t0(0) o106->fir-MDT0002@10.9.105.1@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1557168354 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 06 11:45:54 fir-md1-s1 kernel: Lustre: 102572:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages May 06 11:46:12 fir-md1-s1 kernel: Lustre: 102537:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff982a4be9e000 x1632103133700416/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:17/0 lens 480/568 e 0 to 0 dl 1557168377 ref 2 fl Interpret:/0/0 rc 0/0 May 06 11:46:12 fir-md1-s1 kernel: Lustre: 102537:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 116 previous similar messages May 06 11:46:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 11:46:18 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 06 11:46:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.114.5@o2ib4) May 06 11:46:46 fir-md1-s1 kernel: Lustre: Skipped 888 previous similar messages May 06 11:48:13 fir-md1-s1 kernel: LustreError: 102623:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.9.105.2@o2ib4) failed to reply to blocking AST (req@ffff983cede13900 x1632351292572752 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff984cac6fcc80/0xce8853af8579a6cd lrc: 4/0,0 mode: PR/PR res: [0x2c0023801:0x6c7:0x0].0x0 bits 0x5b/0x0 rrc: 8 type: IBT flags: 0x60200400000020 nid: 10.9.105.2@o2ib4 remote: 0xd3fc16f6ec55ec6 expref: 54 pid: 102649 timeout: 693787 lvb_type: 0 May 06 11:48:13 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.9.105.2@o2ib4 was evicted due to a lock blocking callback time out: rc -110 May 06 11:48:13 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 35s: evicting client at 10.9.105.2@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff984cac6fcc80/0xce8853af8579a6cd lrc: 3/0,0 mode: PR/PR res: [0x2c0023801:0x6c7:0x0].0x0 bits 0x5b/0x0 rrc: 8 type: IBT flags: 0x60200400000020 nid: 10.9.105.2@o2ib4 remote: 0xd3fc16f6ec55ec6 expref: 55 pid: 102649 timeout: 0 lvb_type: 0 May 06 11:48:24 fir-md1-s1 kernel: Lustre: 102473:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557168497/real 1557168497] req@ffff982641e91500 x1632351294099392/t0(0) o106->fir-MDT0002@10.9.104.25@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1557168504 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 11:48:24 fir-md1-s1 kernel: Lustre: 102473:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 46 previous similar messages May 06 11:48:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 694a702d-81e9-6b52-2bc0-9b6e0109536d (at 10.9.106.3@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982cfb400000, cur 1557168533 expire 1557168383 last 1557168306 May 06 11:48:53 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 11:48:54 fir-md1-s1 kernel: Lustre: 102572:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (185:2s); client may timeout. req@ffff982a4be9e000 x1632103133700416/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:17/0 lens 480/536 e 0 to 0 dl 1557168532 ref 1 fl Complete:/0/0 rc 301/301 May 06 11:48:54 fir-md1-s1 kernel: Lustre: 102572:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message May 06 11:52:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 7b8ef936-4105-b014-ef94-c36c21315324 (at 10.8.1.1@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985cd971c800, cur 1557168727 expire 1557168577 last 1557168500 May 06 11:52:07 fir-md1-s1 kernel: Lustre: Skipped 79 previous similar messages May 06 11:57:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 6a59f5f4-9fe2-de7b-dbb3-3fa96ee5f795 (at 10.9.108.15@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982c9fd56c00, cur 1557169061 expire 1557168911 last 1557168834 May 06 11:57:41 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 06 12:01:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 06 12:01:30 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages May 06 12:01:32 fir-md1-s1 kernel: LNetError: 101317:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 06 12:03:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client ea3d1456-cd4f-7735-a810-3c3f6db723ce (at 10.9.113.13@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98283fe71000, cur 1557169424 expire 1557169274 last 1557169197 May 06 12:03:44 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 06 12:33:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e82f6d89-d034-c003-3b7e-5d3893bce363 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9822b36bfc00, cur 1557171212 expire 1557171062 last 1557170985 May 06 12:33:32 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages May 06 12:39:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 06 12:39:43 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 06 12:43:30 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 853f3428-85e5-1ec4-67ff-3fc10cee4956 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982565296400, cur 1557171810 expire 1557171660 last 1557171583 May 06 12:43:30 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 12:43:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 06 12:43:52 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 12:47:04 fir-md1-s1 kernel: Lustre: 101919:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557172017/real 1557172017] req@ffff981fc5bbad00 x1632352415805744/t0(0) o104->fir-MDT0002@10.9.108.11@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1557172024 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 06 12:47:04 fir-md1-s1 kernel: Lustre: 101919:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 20 previous similar messages May 06 12:47:12 fir-md1-s1 kernel: Lustre: 102376:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982b2cd50c00 x1631920772192624/t0(0) o101->05e008de-8b88-3f06-a73e-643eeb4c554e@10.9.108.13@o2ib4:17/0 lens 480/568 e 1 to 0 dl 1557172037 ref 2 fl Interpret:/0/0 rc 0/0 May 06 12:47:12 fir-md1-s1 kernel: Lustre: 102376:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 5 previous similar messages May 06 12:47:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 05e008de-8b88-3f06-a73e-643eeb4c554e (at 10.9.108.13@o2ib4) reconnecting May 06 12:47:18 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages May 06 12:47:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.108.13@o2ib4) May 06 12:47:18 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 12:47:32 fir-md1-s1 kernel: LustreError: 101919:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.9.108.11@o2ib4) failed to reply to blocking AST (req@ffff981fc5bbad00 x1632352415805744 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff984c97902640/0xce8853b062bea5f2 lrc: 4/0,0 mode: PR/PR res: [0x2c001ca80:0x11bd:0x0].0x0 bits 0x5b/0x0 rrc: 15 type: IBT flags: 0x60200400000020 nid: 10.9.108.11@o2ib4 remote: 0xc0f4e805bf0ed9a8 expref: 710 pid: 102528 timeout: 697346 lvb_type: 0 May 06 12:47:32 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.9.108.11@o2ib4 was evicted due to a lock blocking callback time out: rc -110 May 06 12:47:32 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 35s: evicting client at 10.9.108.11@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff984c97902640/0xce8853b062bea5f2 lrc: 3/0,0 mode: PR/PR res: [0x2c001ca80:0x11bd:0x0].0x0 bits 0x5b/0x0 rrc: 15 type: IBT flags: 0x60200400000020 nid: 10.9.108.11@o2ib4 remote: 0xc0f4e805bf0ed9a8 expref: 711 pid: 102528 timeout: 0 lvb_type: 0 May 06 12:50:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b18b7a53-be90-2f32-7eae-8c062d22c266 (at 10.9.108.11@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984709670000, cur 1557172226 expire 1557172076 last 1557171999 May 06 12:50:26 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 13:06:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 9fb03ea9-0aba-6ac3-73c3-4b8f9d8e2193 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9846712a2c00, cur 1557173189 expire 1557173039 last 1557172962 May 06 13:06:29 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 06 13:07:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 06 13:07:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f7bddeb5-cf2d-7b09-7625-5fdb36d4978d (at 10.8.27.14@o2ib6) in 194 seconds. I think it's dead, and I am evicting it. exp ffff985cf1339000, cur 1557173265 expire 1557173115 last 1557173071 May 06 13:07:45 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 13:11:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 56f89491-06d9-7455-5e64-eb6ec04ed229 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9835aba5ec00, cur 1557173479 expire 1557173329 last 1557173252 May 06 13:11:19 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 13:11:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 06 13:11:37 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 13:13:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d64a3116-c9b2-082e-3250-a9e5dffa1cb1 (at 10.8.14.5@o2ib6) May 06 13:13:01 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 13:15:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.101.20@o2ib4) May 06 13:15:37 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages May 06 13:16:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client af4685da-89b6-0ca2-9a5a-3f2a8408f108 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff981f8d712c00, cur 1557173802 expire 1557173652 last 1557173575 May 06 13:16:42 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 13:25:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 6e6fa5c3-eb03-711c-667c-55b18b265db5 (at 10.8.15.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9826d274e000, cur 1557174358 expire 1557174208 last 1557174131 May 06 13:25:58 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 13:42:30 fir-md1-s1 kernel: Lustre: 102530:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 May 06 13:44:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 06 13:44:10 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages May 06 14:00:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f8994245-bed0-7e79-3bee-894973fb0d61 (at 10.8.1.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985ce3307800, cur 1557176420 expire 1557176270 last 1557176193 May 06 14:00:20 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 06 14:01:06 fir-md1-s1 kernel: LNetError: 101309:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 06 14:03:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 42bc888f-d04c-9162-e1f7-365337eb9a74 (at 10.8.10.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985c307d5c00, cur 1557176584 expire 1557176434 last 1557176357 May 06 14:03:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 14:18:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 29ba976c-0339-3250-27f8-d032be65fedc (at 10.8.17.3@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985cf2b86400, cur 1557177484 expire 1557177334 last 1557177257 May 06 14:18:04 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 06 14:29:16 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 0f909ee4-cac2-357a-6726-f3c03147ed59 (at 10.8.14.4@o2ib6) May 06 14:29:16 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 14:32:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client bd62463f-9754-d7dd-c0ae-947965e600cc (at 10.8.14.2@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9847dd648800, cur 1557178334 expire 1557178184 last 1557178107 May 06 14:32:14 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 14:33:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 5d6d99c9-66e9-1b74-491b-ca51f1cc6e94 (at 10.8.13.24@o2ib6) in 219 seconds. I think it's dead, and I am evicting it. exp ffff983f5dee7000, cur 1557178410 expire 1557178260 last 1557178191 May 06 14:33:30 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 06 14:33:38 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 5f509494-315b-4587-12de-2d45d74dabb1 (at 10.8.13.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983f5ef5f000, cur 1557178418 expire 1557178268 last 1557178191 May 06 14:33:38 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages May 06 14:34:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e7f1f31f-82d4-20eb-7b7d-3bdf55af6073 (at 10.8.14.6@o2ib6) in 214 seconds. I think it's dead, and I am evicting it. exp ffff982273abec00, cur 1557178486 expire 1557178336 last 1557178272 May 06 14:34:46 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 06 14:51:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d64a3116-c9b2-082e-3250-a9e5dffa1cb1 (at 10.8.14.5@o2ib6) May 06 14:51:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 14:52:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.14.9@o2ib6) May 06 14:52:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 14:54:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.112.13@o2ib4) May 06 14:54:34 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 14:56:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to da4545b1-0147-b1d4-a503-38a728550aa4 (at 10.9.108.54@o2ib4) May 06 14:56:02 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 14:56:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to e52f4411-4e16-a472-807e-43197deb0fe4 (at 10.9.112.17@o2ib4) May 06 14:56:48 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 14:57:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.107.33@o2ib4) May 06 14:57:29 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 06 14:57:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.14.2@o2ib6) May 06 14:57:55 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 06 14:58:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to c8a471e2-8247-8c8c-ee6a-5df6daf218fc (at 10.8.14.8@o2ib6) May 06 14:58:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 15:00:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 8aeff8fe-9dd1-451e-d39b-6b80a387909c (at 10.9.105.12@o2ib4) May 06 15:00:01 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 15:04:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.105.4@o2ib4) May 06 15:04:08 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages May 06 15:20:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b39fa982-051a-897c-6b07-fb455a7a2cb3 (at 10.8.27.14@o2ib6) May 06 15:20:22 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 06 15:21:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 8291de35-2b9b-a367-f026-cecf1f3c56bb (at 10.8.17.3@o2ib6) May 06 15:21:08 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 15:23:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 6be580c6-fe42-a7a7-821b-08548de77e4c (at 10.8.1.10@o2ib6) May 06 15:23:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 15:43:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 2f17b0c4-f886-f86e-900b-0be5810e3dce (at 10.8.1.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982ca491a400, cur 1557182615 expire 1557182465 last 1557182388 May 06 15:43:35 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 16:15:32 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.13.24@o2ib6) May 06 16:15:32 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 16:16:14 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 9949a709-c94b-e8c1-4d0d-03290eb6897a (at 10.8.14.6@o2ib6) May 06 16:16:14 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 16:26:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 3c74c277-aa1c-7cd2-ac13-ab73bee6c6b3 (at 10.8.1.16@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985ccb50f400, cur 1557185201 expire 1557185051 last 1557184974 May 06 16:26:41 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 16:33:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to fd333fb3-e04b-2f1d-b681-c91fc27a7b11 (at 10.8.23.3@o2ib6) May 06 16:33:24 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 06 16:34:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.15.4@o2ib6) May 06 16:34:30 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 16:35:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 5ec86d6a-3501-9133-dadf-d9ea83e85f2c (at 10.9.109.46@o2ib4) May 06 16:35:11 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 16:35:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 675a8ac9-b44f-ac67-d5e8-cc21a647f68f (at 10.9.109.48@o2ib4) May 06 16:35:34 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages May 06 16:37:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 5043d9f7-c975-4c20-83af-36cce6fb97a7 (at 10.9.109.36@o2ib4) May 06 16:37:55 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 16:39:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 9b8bfc5e-e8c9-876d-cd15-d2b1399c3c1c (at 10.9.107.30@o2ib4) May 06 16:39:11 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages May 06 16:46:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 07fb1649-64d8-ae83-4bea-7750f31fccba (at 10.8.27.5@o2ib6) May 06 16:46:22 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 06 16:57:47 fir-md1-s1 kernel: Lustre: 102601:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557187060/real 1557187060] req@ffff985cccb49e00 x1632354511401104/t0(0) o106->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557187067 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 06 16:57:47 fir-md1-s1 kernel: Lustre: 102601:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages May 06 16:57:54 fir-md1-s1 kernel: Lustre: 102601:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557187067/real 1557187067] req@ffff985cccb49e00 x1632354511401104/t0(0) o106->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557187074 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 16:57:55 fir-md1-s1 kernel: Lustre: 103231:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff985c86a66900 x1631364085119280/t0(0) o101->0f8f808f-b03b-81e6-e30e-46ff547f2e45@10.9.113.3@o2ib4:0/0 lens 480/568 e 1 to 0 dl 1557187080 ref 2 fl Interpret:/0/0 rc 0/0 May 06 16:58:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 0f8f808f-b03b-81e6-e30e-46ff547f2e45 (at 10.9.113.3@o2ib4) reconnecting May 06 16:58:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.9.113.3@o2ib4) May 06 16:58:01 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages May 06 16:58:08 fir-md1-s1 kernel: Lustre: 102601:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557187081/real 1557187081] req@ffff985cccb49e00 x1632354511401104/t0(0) o106->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557187088 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 16:58:08 fir-md1-s1 kernel: Lustre: 102601:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 06 16:58:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 0f8f808f-b03b-81e6-e30e-46ff547f2e45 (at 10.9.113.3@o2ib4) reconnecting May 06 16:58:29 fir-md1-s1 kernel: Lustre: 102601:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557187102/real 1557187102] req@ffff985cccb49e00 x1632354511401104/t0(0) o106->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557187109 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 16:58:29 fir-md1-s1 kernel: Lustre: 102601:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 06 16:58:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 0f8f808f-b03b-81e6-e30e-46ff547f2e45 (at 10.9.113.3@o2ib4) reconnecting May 06 16:59:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 0f8f808f-b03b-81e6-e30e-46ff547f2e45 (at 10.9.113.3@o2ib4) reconnecting May 06 16:59:11 fir-md1-s1 kernel: Lustre: 102601:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557187144/real 1557187144] req@ffff985cccb49e00 x1632354511401104/t0(0) o106->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557187151 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 16:59:11 fir-md1-s1 kernel: Lustre: 102601:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 06 16:59:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 0f8f808f-b03b-81e6-e30e-46ff547f2e45 (at 10.9.113.3@o2ib4) reconnecting May 06 16:59:53 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 06 17:00:28 fir-md1-s1 kernel: Lustre: 102601:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557187221/real 1557187221] req@ffff985cccb49e00 x1632354511401104/t0(0) o106->fir-MDT0000@10.8.9.9@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557187228 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 17:00:28 fir-md1-s1 kernel: Lustre: 102601:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 10 previous similar messages May 06 17:01:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 29aab465-7cf3-868c-3ff7-4901255a2788 (at 10.8.9.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985cabafec00, cur 1557187260 expire 1557187110 last 1557187033 May 06 17:01:00 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 17:01:01 fir-md1-s1 kernel: LNet: Service thread pid 102601 was inactive for 200.61s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: May 06 17:01:01 fir-md1-s1 kernel: Pid: 102601, comm: mdt03_030 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 May 06 17:01:01 fir-md1-s1 kernel: Call Trace: May 06 17:01:01 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x500/0x8d0 [ptlrpc] May 06 17:01:01 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] May 06 17:01:01 fir-md1-s1 kernel: [] ldlm_glimpse_locks+0x3b/0x100 [ptlrpc] May 06 17:01:01 fir-md1-s1 kernel: [] mdt_do_glimpse+0x1e9/0x4c0 [mdt] May 06 17:01:01 fir-md1-s1 kernel: [] mdt_glimpse_enqueue+0x3d3/0x4f0 [mdt] May 06 17:01:01 fir-md1-s1 kernel: [] mdt_intent_glimpse+0x1f/0x30 [mdt] May 06 17:01:01 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] May 06 17:01:01 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] May 06 17:01:01 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] May 06 17:01:01 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] May 06 17:01:01 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] May 06 17:01:01 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 06 17:01:01 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] May 06 17:01:01 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 May 06 17:01:01 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 May 06 17:01:01 fir-md1-s1 kernel: [] 0xffffffffffffffff May 06 17:01:01 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1557187261.102601 May 06 17:01:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 29aab465-7cf3-868c-3ff7-4901255a2788 (at 10.8.9.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985cfa2d4400, cur 1557187277 expire 1557187127 last 1557187050 May 06 17:01:17 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 06 17:01:17 fir-md1-s1 kernel: Lustre: 102601:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (206:11s); client may timeout. req@ffff985c86a66900 x1631364085119280/t0(0) o101->0f8f808f-b03b-81e6-e30e-46ff547f2e45@10.9.113.3@o2ib4:0/0 lens 480/536 e 1 to 0 dl 1557187266 ref 1 fl Complete:/0/0 rc 301/301 May 06 17:01:17 fir-md1-s1 kernel: LNet: Service thread pid 102601 completed after 216.54s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). May 06 18:38:17 fir-md1-s1 kernel: Lustre: 102477:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557193090/real 1557193090] req@ffff9828ed6e2d00 x1632356179193680/t0(0) o106->fir-MDT0002@10.8.20.5@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557193097 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 06 18:38:17 fir-md1-s1 kernel: Lustre: 102477:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 6 previous similar messages May 06 18:38:25 fir-md1-s1 kernel: Lustre: 102667:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982959d7bf00 x1632104191660000/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:0/0 lens 480/568 e 1 to 0 dl 1557193110 ref 2 fl Interpret:/0/0 rc 0/0 May 06 18:38:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a63699e6-d0e7-040a-1db9-cc9068a34f0c (at 10.8.20.5@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984cdc8d4c00, cur 1557193111 expire 1557192961 last 1557192884 May 06 18:38:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 18:38:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 18:38:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 06 18:38:31 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages May 06 18:38:38 fir-md1-s1 kernel: Lustre: 102477:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557193111/real 1557193111] req@ffff9828ed6e2d00 x1632356179193680/t0(0) o106->fir-MDT0002@10.8.20.5@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557193118 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 18:38:38 fir-md1-s1 kernel: Lustre: 102477:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 06 18:38:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client a63699e6-d0e7-040a-1db9-cc9068a34f0c (at 10.8.20.5@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985cf7c42400, cur 1557193122 expire 1557192972 last 1557192895 May 06 18:38:42 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 06 19:07:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 718e2070-f49a-08c6-62d7-7d002d5b938d (at 10.9.104.69@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985cf9a98800, cur 1557194846 expire 1557194696 last 1557194619 May 06 19:14:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client a2b26b96-a929-8f63-cbb2-ebc7ce2d142e (at 10.8.1.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985cd64adc00, cur 1557195269 expire 1557195119 last 1557195042 May 06 19:14:29 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 19:14:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a2b26b96-a929-8f63-cbb2-ebc7ce2d142e (at 10.8.1.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985c16a4f800, cur 1557195284 expire 1557195134 last 1557195057 May 06 19:14:44 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 06 20:08:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 48bf043f-4a40-2dca-0013-959a568eb747 (at 10.8.1.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985cfe183800, cur 1557198509 expire 1557198359 last 1557198282 May 06 20:08:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 48bf043f-4a40-2dca-0013-959a568eb747 (at 10.8.1.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9858f1ac9800, cur 1557198513 expire 1557198363 last 1557198286 May 06 20:08:33 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 06 20:56:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d3ad20a5-81a2-915a-58a3-1542c85784cf (at 10.9.107.53@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9858f1ac8000, cur 1557201411 expire 1557201261 last 1557201184 May 06 21:46:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 77874b32-e186-7da5-3231-7675bcd6ec17 (at 10.9.102.6@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985cb6f20c00, cur 1557204383 expire 1557204233 last 1557204156 May 06 21:46:23 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 06 21:46:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 77874b32-e186-7da5-3231-7675bcd6ec17 (at 10.9.102.6@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985c8fb58800, cur 1557204395 expire 1557204245 last 1557204168 May 06 21:46:35 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 06 22:18:12 fir-md1-s1 kernel: Lustre: 102388:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557206285/real 1557206285] req@ffff9826807ea400 x1632360678728208/t0(0) o106->fir-MDT0002@10.9.102.33@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1557206292 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 06 22:18:12 fir-md1-s1 kernel: Lustre: 102388:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 06 22:18:19 fir-md1-s1 kernel: Lustre: 102473:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557206292/real 1557206292] req@ffff982b61dc9800 x1632360678728864/t0(0) o106->fir-MDT0002@10.9.102.33@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1557206299 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 22:18:20 fir-md1-s1 kernel: Lustre: 102400:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9821d5e12a00 x1632104722964592/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:25/0 lens 480/568 e 1 to 0 dl 1557206305 ref 2 fl Interpret:/0/0 rc 0/0 May 06 22:18:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 22:18:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 06 22:18:33 fir-md1-s1 kernel: Lustre: 102473:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557206306/real 1557206306] req@ffff982b61dc9800 x1632360678728864/t0(0) o106->fir-MDT0002@10.9.102.33@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1557206313 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 22:18:33 fir-md1-s1 kernel: Lustre: 102473:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages May 06 22:18:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a99a0fd8-be49-f5cf-71c5-91c5b8a9ee37 (at 10.9.102.33@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98271aa38000, cur 1557206318 expire 1557206168 last 1557206091 May 06 22:18:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 06 22:18:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 06 22:18:54 fir-md1-s1 kernel: Lustre: 102388:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557206327/real 1557206327] req@ffff9826807ea400 x1632360678728208/t0(0) o106->fir-MDT0002@10.9.102.33@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1557206334 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 22:18:54 fir-md1-s1 kernel: Lustre: 102388:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages May 06 22:18:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client a99a0fd8-be49-f5cf-71c5-91c5b8a9ee37 (at 10.9.102.33@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985c86237c00, cur 1557206335 expire 1557206185 last 1557206108 May 06 22:18:55 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 06 22:27:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 2157fc51-ad2b-af56-95eb-ebec24704c29 (at 10.8.1.5@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985cf2b83000, cur 1557206869 expire 1557206719 last 1557206642 May 06 22:27:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 2157fc51-ad2b-af56-95eb-ebec24704c29 (at 10.8.1.5@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985914e1b000, cur 1557206875 expire 1557206725 last 1557206648 May 06 22:27:55 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 06 22:53:17 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 3fa39b6b-68bb-1570-c431-26d39d28b172 (at 10.9.103.12@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985873b6f000, cur 1557208397 expire 1557208247 last 1557208170 May 06 22:53:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 42c27884-7f5f-6ccb-073b-56ac764ed5ce (at 10.9.103.12@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985cd64a8000, cur 1557208407 expire 1557208257 last 1557208180 May 06 22:53:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 42c27884-7f5f-6ccb-073b-56ac764ed5ce (at 10.9.103.12@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9846712f9800, cur 1557208409 expire 1557208259 last 1557208182 May 06 23:24:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d0c39c1f-f040-e7be-8021-8fbd0c8d72d2 (at 10.8.11.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982c9fd57000, cur 1557210270 expire 1557210120 last 1557210043 May 06 23:39:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 1e35dbc2-c2aa-b6ce-6f18-8939c99fde0f (at 10.8.30.5@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985cfc6a5000, cur 1557211179 expire 1557211029 last 1557210952 May 06 23:39:39 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 00:00:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 4d220cc4-d3e3-4651-12cb-6e4bd675cf57 (at 10.8.22.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98271aa3d000, cur 1557212452 expire 1557212302 last 1557212225 May 07 00:00:52 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 00:56:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 02143cf9-a33a-721d-1d20-5e9ca2a6e670 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983844705000, cur 1557215774 expire 1557215624 last 1557215547 May 07 00:56:14 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 00:57:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.11.9@o2ib6) May 07 02:03:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 8291e59c-ac50-4b3f-bf9b-4a485234c804 (at 10.9.114.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98484c227c00, cur 1557219780 expire 1557219630 last 1557219553 May 07 02:03:00 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 02:08:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.114.5@o2ib4) May 07 02:08:36 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 02:24:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b9d08568-0dfb-b72b-5b71-42d51a8d04c5 (at 10.9.114.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98320a34d000, cur 1557221074 expire 1557220924 last 1557220847 May 07 02:24:34 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 02:27:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 00e1f138-c014-8d58-b144-b69a54760ae8 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98288b3bcc00, cur 1557221256 expire 1557221106 last 1557221029 May 07 02:27:36 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 02:29:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.11.9@o2ib6) May 07 02:29:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 02:31:26 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.114.5@o2ib4) May 07 02:31:26 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 02:49:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 32e552d9-553a-99e0-f22a-c15ac116b169 (at 10.9.108.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98283fe73000, cur 1557222571 expire 1557222421 last 1557222344 May 07 02:49:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 02:49:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 32e552d9-553a-99e0-f22a-c15ac116b169 (at 10.9.108.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985cfa2e1000, cur 1557222574 expire 1557222424 last 1557222347 May 07 03:19:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 02ea929c-a46f-c52b-c40b-0790522b7de6 (at 10.8.1.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983c834b2000, cur 1557224375 expire 1557224225 last 1557224148 May 07 03:19:35 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 07 03:20:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.1.29@o2ib6) May 07 03:20:28 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 03:23:52 fir-md1-s1 kernel: Lustre: 101913:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557224624/real 1557224624] req@ffff982b673d5700 x1632366533272880/t0(0) o106->fir-MDT0002@10.9.102.57@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1557224631 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 03:23:59 fir-md1-s1 kernel: Lustre: 101913:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557224632/real 1557224632] req@ffff982b673d5700 x1632366533272880/t0(0) o106->fir-MDT0002@10.9.102.57@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1557224639 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 03:23:59 fir-md1-s1 kernel: Lustre: 102546:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff981f1ea21200 x1632105139114816/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:4/0 lens 480/568 e 1 to 0 dl 1557224644 ref 2 fl Interpret:/0/0 rc 0/0 May 07 03:23:59 fir-md1-s1 kernel: Lustre: 102546:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 07 03:24:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 03:24:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 07 03:24:05 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 03:24:13 fir-md1-s1 kernel: Lustre: 101913:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557224646/real 1557224646] req@ffff982b673d5700 x1632366533272880/t0(0) o106->fir-MDT0002@10.9.102.57@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1557224653 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 03:24:13 fir-md1-s1 kernel: Lustre: 101913:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 07 03:24:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 03:24:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 07 03:24:34 fir-md1-s1 kernel: Lustre: 101913:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557224667/real 1557224667] req@ffff982b673d5700 x1632366533272880/t0(0) o106->fir-MDT0002@10.9.102.57@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1557224674 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 03:24:34 fir-md1-s1 kernel: Lustre: 101913:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 07 03:24:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 03:24:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 07 03:25:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 03:25:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 07 03:25:16 fir-md1-s1 kernel: Lustre: 101913:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557224709/real 1557224709] req@ffff982b673d5700 x1632366533272880/t0(0) o106->fir-MDT0002@10.9.102.57@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1557224716 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 03:25:16 fir-md1-s1 kernel: Lustre: 101913:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 07 03:25:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 03:25:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 07 03:25:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 03:25:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 07 03:26:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 03:26:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 385ad677-c9f9-e2fe-015c-c152ce073ee0 (at 10.9.102.57@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98271aa3cc00, cur 1557224792 expire 1557224642 last 1557224565 May 07 03:26:32 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 03:26:33 fir-md1-s1 kernel: Lustre: 101913:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557224786/real 1557224786] req@ffff982b673d5700 x1632366533272880/t0(0) o106->fir-MDT0002@10.9.102.57@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1557224793 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 03:26:33 fir-md1-s1 kernel: Lustre: 101913:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 10 previous similar messages May 07 03:26:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 07 03:26:33 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 07 03:26:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 385ad677-c9f9-e2fe-015c-c152ce073ee0 (at 10.9.102.57@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985cefb13400, cur 1557224794 expire 1557224644 last 1557224567 May 07 03:26:34 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 07 03:51:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d4b920a7-fa43-47d1-36f8-6c5e2715eb6d (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983e20a60400, cur 1557226306 expire 1557226156 last 1557226079 May 07 03:54:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 07 04:01:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 08876076-308b-16ed-fef6-3539ec3414c1 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9823abf4ec00, cur 1557226891 expire 1557226741 last 1557226664 May 07 04:01:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 04:01:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 07 04:01:46 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 04:08:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 2f3c3b21-f910-699c-fb11-7d9409e39ccb (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982286d84c00, cur 1557227337 expire 1557227187 last 1557227110 May 07 04:08:57 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 04:09:16 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 07 04:09:16 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 04:14:47 fir-md1-s1 kernel: Lustre: 102400:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557227680/real 1557227680] req@ffff9828bba34800 x1632367552284976/t0(0) o104->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557227687 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 04:14:55 fir-md1-s1 kernel: Lustre: 102651:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9820783a1200 x1631544941456624/t0(0) o101->cdee964a-caf8-9055-00d7-e4e0b6d655dc@10.8.27.31@o2ib6:0/0 lens 480/568 e 1 to 0 dl 1557227700 ref 2 fl Interpret:/0/0 rc 0/0 May 07 04:15:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client cdee964a-caf8-9055-00d7-e4e0b6d655dc (at 10.8.27.31@o2ib6) reconnecting May 07 04:15:01 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 07 04:15:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.8.27.31@o2ib6) May 07 04:15:01 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 04:15:08 fir-md1-s1 kernel: Lustre: 102400:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557227701/real 1557227701] req@ffff9828bba34800 x1632367552284976/t0(0) o104->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557227708 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 04:15:08 fir-md1-s1 kernel: Lustre: 102400:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 07 04:15:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client cdee964a-caf8-9055-00d7-e4e0b6d655dc (at 10.8.27.31@o2ib6) reconnecting May 07 04:15:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client cdee964a-caf8-9055-00d7-e4e0b6d655dc (at 10.8.27.31@o2ib6) reconnecting May 07 04:15:50 fir-md1-s1 kernel: Lustre: 102400:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557227743/real 1557227743] req@ffff9828bba34800 x1632367552284976/t0(0) o104->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557227750 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 04:15:50 fir-md1-s1 kernel: Lustre: 102400:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 07 04:16:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client cdee964a-caf8-9055-00d7-e4e0b6d655dc (at 10.8.27.31@o2ib6) reconnecting May 07 04:16:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.8.27.31@o2ib6) May 07 04:16:26 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages May 07 04:16:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client cdee964a-caf8-9055-00d7-e4e0b6d655dc (at 10.8.27.31@o2ib6) reconnecting May 07 04:16:47 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 07 04:17:07 fir-md1-s1 kernel: Lustre: 102400:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557227820/real 1557227820] req@ffff9828bba34800 x1632367552284976/t0(0) o104->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557227827 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 04:17:07 fir-md1-s1 kernel: Lustre: 102400:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 11 previous similar messages May 07 04:17:14 fir-md1-s1 kernel: LustreError: 102400:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.27.23@o2ib6) failed to reply to blocking AST (req@ffff9828bba34800 x1632367552284976 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff985b4963b600/0xce8853b28d772893 lrc: 4/0,0 mode: PW/PW res: [0x2000217fe:0x35:0x0].0x0 bits 0x40/0x0 rrc: 11 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0x475abc83f1787ba8 expref: 26 pid: 102611 timeout: 753248 lvb_type: 0 May 07 04:17:14 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.8.27.23@o2ib6 was evicted due to a lock blocking callback time out: rc -110 May 07 04:17:14 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.27.23@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff985b4963b600/0xce8853b28d772893 lrc: 3/0,0 mode: PW/PW res: [0x2000217fe:0x35:0x0].0x0 bits 0x40/0x0 rrc: 11 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0x475abc83f1787ba8 expref: 27 pid: 102611 timeout: 0 lvb_type: 0 May 07 04:17:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 2ce2ff3c-6f50-7561-f990-2838c405e040 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984114bb9c00, cur 1557227837 expire 1557227687 last 1557227610 May 07 04:17:17 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 04:24:13 fir-md1-s1 kernel: Lustre: 102477:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff98272ee37200 x1632105175592656/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:18/0 lens 480/568 e 0 to 0 dl 1557228258 ref 2 fl Interpret:/0/0 rc 0/0 May 07 04:24:19 fir-md1-s1 kernel: Lustre: 102363:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557228228/real 1557228228] req@ffff9823fc4cbf00 x1632367736681664/t0(0) o106->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557228259 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 04:24:19 fir-md1-s1 kernel: Lustre: 102363:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 07 04:24:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 04:24:19 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 07 04:24:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.0.10.3@o2ib7) May 07 04:24:19 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 04:24:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b819a236-3c20-45c6-c515-a5f31f7fdb5f (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984ca13a6000, cur 1557228277 expire 1557228127 last 1557228050 May 07 04:24:37 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 07 04:29:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 5e0de70d-f314-c595-2c05-b0f5e1db8013 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984cc0108000, cur 1557228541 expire 1557228391 last 1557228314 May 07 04:29:01 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 04:34:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 74fb56c5-8bc6-38a9-8624-788945b7232f (at 10.9.115.2@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985914e18800, cur 1557228864 expire 1557228714 last 1557228637 May 07 04:34:24 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 04:34:49 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 07 04:34:49 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages May 07 04:39:22 fir-md1-s1 kernel: Lustre: 102370:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557229155/real 1557229155] req@ffff982b50fa2d00 x1632368046535536/t0(0) o104->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557229162 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 04:39:30 fir-md1-s1 kernel: Lustre: 102363:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982883fe1b00 x1631544486182080/t0(0) o101->55b02c38-d9ce-c2f6-066c-e168569494ff@10.9.105.33@o2ib4:5/0 lens 480/568 e 1 to 0 dl 1557229175 ref 2 fl Interpret:/0/0 rc 0/0 May 07 04:39:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 55b02c38-d9ce-c2f6-066c-e168569494ff (at 10.9.105.33@o2ib4) reconnecting May 07 04:39:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 55b02c38-d9ce-c2f6-066c-e168569494ff (at 10.9.105.33@o2ib4) reconnecting May 07 04:40:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 55b02c38-d9ce-c2f6-066c-e168569494ff (at 10.9.105.33@o2ib4) reconnecting May 07 04:40:39 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 07 04:41:49 fir-md1-s1 kernel: LustreError: 102370:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.27.23@o2ib6) failed to reply to blocking AST (req@ffff982b50fa2d00 x1632368046535536 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff98225e52e0c0/0xce8853b29d0948a0 lrc: 4/0,0 mode: PW/PW res: [0x200021979:0x7:0x0].0x0 bits 0x40/0x0 rrc: 11 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0x31f985543a6c7eca expref: 26 pid: 102618 timeout: 754723 lvb_type: 0 May 07 04:41:49 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.8.27.23@o2ib6 was evicted due to a lock blocking callback time out: rc -110 May 07 04:41:49 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.27.23@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff98225e52e0c0/0xce8853b29d0948a0 lrc: 3/0,0 mode: PW/PW res: [0x200021979:0x7:0x0].0x0 bits 0x40/0x0 rrc: 11 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0x31f985543a6c7eca expref: 27 pid: 102618 timeout: 0 lvb_type: 0 May 07 04:42:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client a8459b8b-f733-b042-6d1e-7ebe1321b89f (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98227d22e800, cur 1557229345 expire 1557229195 last 1557229118 May 07 04:42:25 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 04:47:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 72f7533c-a81d-633d-f6a2-6cb31c1ace4e (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9836ce217800, cur 1557229647 expire 1557229497 last 1557229420 May 07 04:47:27 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 07 04:48:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 07 04:48:01 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages May 07 04:54:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f3b84fa8-506b-6fb1-c444-1dbe27e3ce68 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9822c0215400, cur 1557230061 expire 1557229911 last 1557229834 May 07 04:54:21 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 05:01:47 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client dc960f0f-3ce7-f6bf-812f-a2e5fe5f5a06 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9831ecebd000, cur 1557230507 expire 1557230357 last 1557230280 May 07 05:01:47 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 05:02:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.23@o2ib6) May 07 05:02:04 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages May 07 05:13:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 7872092f-2d48-c973-f6b4-42b06ec9c7dc (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982b987d0000, cur 1557231190 expire 1557231040 last 1557230963 May 07 05:13:10 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 05:14:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.11.9@o2ib6) May 07 05:14:31 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 05:28:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 7dbd9b09-ecbf-56f3-78ec-84aa8794d98c (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984808bb8c00, cur 1557232110 expire 1557231960 last 1557231883 May 07 05:28:30 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages May 07 05:29:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 07 05:29:22 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 05:45:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 8585c42f-96ef-d59f-ea1a-9cf9f34c62ca (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9823e06be000, cur 1557233143 expire 1557232993 last 1557232916 May 07 05:45:43 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 05:46:41 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 07 05:46:41 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 05:56:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 57f5ff3d-ed7e-8bd2-f4f8-c6cbe70ab7a7 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff981cd818e400, cur 1557233808 expire 1557233658 last 1557233581 May 07 05:56:48 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 05:57:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 07 05:57:46 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 06:09:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 07 06:09:10 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 06:09:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f931f4dd-24d0-4e09-5490-0594b1a39382 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9831fa240000, cur 1557234576 expire 1557234426 last 1557234349 May 07 06:09:36 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 06:22:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client ba0442bc-14d6-8675-c23a-8dc65b1fa6a6 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983e90a28800, cur 1557235362 expire 1557235212 last 1557235135 May 07 06:22:42 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 06:23:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 07 06:23:31 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 06:35:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 56121a8a-fa26-ef94-bb8a-0996ba74499b (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983673b83400, cur 1557236138 expire 1557235988 last 1557235911 May 07 06:35:38 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 06:36:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 07 06:36:46 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 06:51:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 07 06:51:56 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 06:52:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 0ceb4a36-d485-0e50-ab89-95eb861c4946 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984703225c00, cur 1557237145 expire 1557236995 last 1557236918 May 07 06:52:25 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 07:07:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 833403f0-6f37-5af2-6362-829570a40278 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982b68b23c00, cur 1557238029 expire 1557237879 last 1557237802 May 07 07:07:09 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages May 07 07:08:55 fir-md1-s1 kernel: Lustre: 102672:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557238128/real 1557238128] req@ffff9822c4c8c800 x1632370845738896/t0(0) o106->fir-MDT0002@10.9.101.37@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1557238135 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 07:08:55 fir-md1-s1 kernel: Lustre: 102672:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 21 previous similar messages May 07 07:09:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 07 07:09:00 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages May 07 07:09:03 fir-md1-s1 kernel: Lustre: 102593:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9822c84cc500 x1632105261596160/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:8/0 lens 480/568 e 1 to 0 dl 1557238148 ref 2 fl Interpret:/0/0 rc 0/0 May 07 07:09:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 07:09:09 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages May 07 07:09:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 07:09:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 07:10:12 fir-md1-s1 kernel: Lustre: 102672:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557238205/real 1557238205] req@ffff9822c4c8c800 x1632370845738896/t0(0) o106->fir-MDT0002@10.9.101.37@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1557238212 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 07:10:12 fir-md1-s1 kernel: Lustre: 102672:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 10 previous similar messages May 07 07:10:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 07:10:34 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 07 07:11:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 07:11:58 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages May 07 07:12:09 fir-md1-s1 kernel: LNet: Service thread pid 102672 was inactive for 200.40s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: May 07 07:12:09 fir-md1-s1 kernel: Pid: 102672, comm: mdt00_092 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 May 07 07:12:09 fir-md1-s1 kernel: Call Trace: May 07 07:12:09 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x500/0x8d0 [ptlrpc] May 07 07:12:09 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] May 07 07:12:09 fir-md1-s1 kernel: [] ldlm_glimpse_locks+0x3b/0x100 [ptlrpc] May 07 07:12:09 fir-md1-s1 kernel: [] mdt_do_glimpse+0x1e9/0x4c0 [mdt] May 07 07:12:09 fir-md1-s1 kernel: [] mdt_glimpse_enqueue+0x3d3/0x4f0 [mdt] May 07 07:12:09 fir-md1-s1 kernel: [] mdt_intent_glimpse+0x1f/0x30 [mdt] May 07 07:12:09 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] May 07 07:12:09 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] May 07 07:12:09 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] May 07 07:12:09 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] May 07 07:12:09 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] May 07 07:12:09 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 07 07:12:09 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] May 07 07:12:09 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 May 07 07:12:09 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 May 07 07:12:09 fir-md1-s1 kernel: [] 0xffffffffffffffff May 07 07:12:09 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1557238329.102672 May 07 07:12:33 fir-md1-s1 kernel: LNet: Service thread pid 102672 completed after 224.38s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). May 07 07:23:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client bfdafa74-3349-cb55-da4a-642f21523763 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98358b39d800, cur 1557239004 expire 1557238854 last 1557238777 May 07 07:23:24 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages May 07 07:23:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 07 07:23:48 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages May 07 07:26:53 fir-md1-s1 kernel: Lustre: 102568:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557239206/real 1557239206] req@ffff9822c4c88300 x1632371157218256/t0(0) o106->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557239213 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 07:26:53 fir-md1-s1 kernel: Lustre: 102568:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 21 previous similar messages May 07 07:27:01 fir-md1-s1 kernel: Lustre: 102708:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982afd3c8000 x1632105268322096/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:6/0 lens 480/568 e 1 to 0 dl 1557239226 ref 2 fl Interpret:/0/0 rc 0/0 May 07 07:27:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 07:27:07 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 07 07:27:14 fir-md1-s1 kernel: Lustre: 102422:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557239227/real 1557239227] req@ffff9822c4c8a400 x1632371157218320/t0(0) o106->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557239234 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 07:27:14 fir-md1-s1 kernel: Lustre: 102422:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages May 07 07:35:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 03736220-e9e7-7d88-5819-8709177dceb7 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9840c3200800, cur 1557239726 expire 1557239576 last 1557239499 May 07 07:35:26 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages May 07 07:36:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 07 07:36:25 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages May 07 07:37:59 fir-md1-s1 kernel: Lustre: 102716:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557239872/real 1557239872] req@ffff9823a5f62400 x1632371355047312/t0(0) o106->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557239879 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 07:37:59 fir-md1-s1 kernel: Lustre: 102716:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 07 07:38:07 fir-md1-s1 kernel: Lustre: 102564:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff981f02e50c00 x1632105272737408/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:12/0 lens 480/568 e 1 to 0 dl 1557239892 ref 2 fl Interpret:/0/0 rc 0/0 May 07 07:38:07 fir-md1-s1 kernel: Lustre: 102564:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 07 07:38:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 07:47:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client cd42bd14-63e7-d4ae-3f6d-cfdf4625d90c (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984caa339c00, cur 1557240473 expire 1557240323 last 1557240246 May 07 07:47:53 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages May 07 07:48:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 07 07:48:29 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages May 07 07:57:54 fir-md1-s1 kernel: Lustre: 102590:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557241067/real 1557241067] req@ffff9833622d2a00 x1632371704757568/t0(0) o106->fir-MDT0000@10.8.26.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557241074 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 07:57:54 fir-md1-s1 kernel: Lustre: 102590:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages May 07 07:58:02 fir-md1-s1 kernel: Lustre: 102449:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff98375ab58c00 x1631534658033776/t0(0) o101->21db4e74-db2a-768a-66c3-cfe236936806@10.8.2.22@o2ib6:7/0 lens 480/568 e 1 to 0 dl 1557241087 ref 2 fl Interpret:/0/0 rc 0/0 May 07 07:58:02 fir-md1-s1 kernel: Lustre: 102449:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 07 07:58:08 fir-md1-s1 kernel: Lustre: 102590:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557241081/real 1557241081] req@ffff9833622d2a00 x1632371704757568/t0(0) o106->fir-MDT0000@10.8.26.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557241088 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 07:58:08 fir-md1-s1 kernel: Lustre: 102590:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 07 07:58:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 21db4e74-db2a-768a-66c3-cfe236936806 (at 10.8.2.22@o2ib6) reconnecting May 07 07:58:29 fir-md1-s1 kernel: Lustre: 102590:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557241102/real 1557241102] req@ffff9833622d2a00 x1632371704757568/t0(0) o106->fir-MDT0000@10.8.26.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557241109 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 07:58:29 fir-md1-s1 kernel: Lustre: 102590:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 07 07:58:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 21db4e74-db2a-768a-66c3-cfe236936806 (at 10.8.2.22@o2ib6) reconnecting May 07 07:58:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to e3f794ef-23d0-ce88-1082-25d135cb6fa4 (at 10.8.2.22@o2ib6) May 07 07:58:29 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages May 07 07:58:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 21db4e74-db2a-768a-66c3-cfe236936806 (at 10.8.2.22@o2ib6) reconnecting May 07 07:59:11 fir-md1-s1 kernel: Lustre: 102590:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557241144/real 1557241144] req@ffff9833622d2a00 x1632371704757568/t0(0) o106->fir-MDT0000@10.8.26.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557241151 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 07:59:11 fir-md1-s1 kernel: Lustre: 102590:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 07 07:59:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 21db4e74-db2a-768a-66c3-cfe236936806 (at 10.8.2.22@o2ib6) reconnecting May 07 07:59:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 21db4e74-db2a-768a-66c3-cfe236936806 (at 10.8.2.22@o2ib6) reconnecting May 07 07:59:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 21db4e74-db2a-768a-66c3-cfe236936806 (at 10.8.2.22@o2ib6) reconnecting May 07 08:00:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 21db4e74-db2a-768a-66c3-cfe236936806 (at 10.8.2.22@o2ib6) reconnecting May 07 08:00:28 fir-md1-s1 kernel: Lustre: 102590:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557241221/real 1557241221] req@ffff9833622d2a00 x1632371704757568/t0(0) o106->fir-MDT0000@10.8.26.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557241228 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 08:00:28 fir-md1-s1 kernel: Lustre: 102590:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 10 previous similar messages May 07 08:00:50 fir-md1-s1 kernel: Lustre: 102388:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff9822f9967200 x1632822835034032/t0(0) o101->9de88c71-1c16-60dc-b5a4-a51bb557fafe@10.8.17.3@o2ib6:25/0 lens 480/568 e 0 to 0 dl 1557241255 ref 2 fl Interpret:/0/0 rc 0/0 May 07 08:00:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 9de88c71-1c16-60dc-b5a4-a51bb557fafe (at 10.8.17.3@o2ib6) reconnecting May 07 08:00:56 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 07 08:01:07 fir-md1-s1 kernel: LNet: Service thread pid 102590 was inactive for 200.31s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: May 07 08:01:07 fir-md1-s1 kernel: Pid: 102590, comm: mdt01_068 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 May 07 08:01:07 fir-md1-s1 kernel: Call Trace: May 07 08:01:07 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x500/0x8d0 [ptlrpc] May 07 08:01:07 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] May 07 08:01:07 fir-md1-s1 kernel: [] ldlm_glimpse_locks+0x3b/0x100 [ptlrpc] May 07 08:01:07 fir-md1-s1 kernel: [] mdt_do_glimpse+0x1e9/0x4c0 [mdt] May 07 08:01:07 fir-md1-s1 kernel: [] mdt_glimpse_enqueue+0x3d3/0x4f0 [mdt] May 07 08:01:07 fir-md1-s1 kernel: [] mdt_intent_glimpse+0x1f/0x30 [mdt] May 07 08:01:07 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] May 07 08:01:07 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] May 07 08:01:07 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] May 07 08:01:07 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] May 07 08:01:07 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] May 07 08:01:07 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 07 08:01:07 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] May 07 08:01:07 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 May 07 08:01:07 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 May 07 08:01:07 fir-md1-s1 kernel: [] 0xffffffffffffffff May 07 08:01:07 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1557241267.102590 May 07 08:01:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client bda33544-34db-f855-cae6-56a4d4826687 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9849b5760c00, cur 1557241276 expire 1557241126 last 1557241049 May 07 08:01:16 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 08:01:16 fir-md1-s1 kernel: LNet: Service thread pid 102590 completed after 208.94s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). May 07 08:42:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 7676a4e6-cd18-5e4a-0855-17e7cc11eb46 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9853226d2c00, cur 1557243735 expire 1557243585 last 1557243508 May 07 08:42:15 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 08:47:36 fir-md1-s1 kernel: Lustre: 102431:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557244049/real 1557244049] req@ffff982b1df77200 x1632372536160096/t0(0) o106->fir-MDT0002@10.9.114.5@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1557244056 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 08:47:36 fir-md1-s1 kernel: Lustre: 102431:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 14 previous similar messages May 07 08:47:44 fir-md1-s1 kernel: Lustre: 102744:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982b21705a00 x1632105308803152/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:19/0 lens 480/568 e 1 to 0 dl 1557244069 ref 2 fl Interpret:/0/0 rc 0/0 May 07 08:47:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 08:47:50 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 07 08:47:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 07 08:47:50 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages May 07 08:47:57 fir-md1-s1 kernel: Lustre: 102647:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557244070/real 1557244070] req@ffff982b1df72d00 x1632372536160224/t0(0) o106->fir-MDT0002@10.9.114.5@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1557244077 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 08:47:57 fir-md1-s1 kernel: Lustre: 102647:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages May 07 08:48:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 08:48:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 08:48:39 fir-md1-s1 kernel: Lustre: 102431:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557244112/real 1557244112] req@ffff982b1df77200 x1632372536160096/t0(0) o106->fir-MDT0002@10.9.114.5@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1557244119 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 08:48:39 fir-md1-s1 kernel: Lustre: 102431:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 10 previous similar messages May 07 08:49:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 08:49:14 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 07 08:49:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 07 08:49:14 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages May 07 08:49:56 fir-md1-s1 kernel: Lustre: 102647:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557244189/real 1557244189] req@ffff982b1df72d00 x1632372536160224/t0(0) o106->fir-MDT0002@10.9.114.5@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1557244196 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 08:49:56 fir-md1-s1 kernel: Lustre: 102647:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 20 previous similar messages May 07 08:50:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 08:50:38 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages May 07 08:50:49 fir-md1-s1 kernel: LNet: Service thread pid 102647 was inactive for 200.24s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: May 07 08:50:49 fir-md1-s1 kernel: Pid: 102647, comm: mdt00_085 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 May 07 08:50:49 fir-md1-s1 kernel: Call Trace: May 07 08:50:49 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x500/0x8d0 [ptlrpc] May 07 08:50:49 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] May 07 08:50:49 fir-md1-s1 kernel: [] ldlm_glimpse_locks+0x3b/0x100 [ptlrpc] May 07 08:50:49 fir-md1-s1 kernel: [] mdt_do_glimpse+0x1e9/0x4c0 [mdt] May 07 08:50:49 fir-md1-s1 kernel: [] mdt_glimpse_enqueue+0x3d3/0x4f0 [mdt] May 07 08:50:49 fir-md1-s1 kernel: [] mdt_intent_glimpse+0x1f/0x30 [mdt] May 07 08:50:49 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] May 07 08:50:49 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] May 07 08:50:49 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] May 07 08:50:49 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] May 07 08:50:49 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] May 07 08:50:49 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 07 08:50:49 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] May 07 08:50:49 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 May 07 08:50:49 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 May 07 08:50:49 fir-md1-s1 kernel: [] 0xffffffffffffffff May 07 08:50:49 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1557244249.102647 May 07 08:50:49 fir-md1-s1 kernel: Pid: 102431, comm: mdt00_027 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 May 07 08:50:49 fir-md1-s1 kernel: Call Trace: May 07 08:50:49 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x500/0x8d0 [ptlrpc] May 07 08:50:49 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] May 07 08:50:49 fir-md1-s1 kernel: [] ldlm_glimpse_locks+0x3b/0x100 [ptlrpc] May 07 08:50:49 fir-md1-s1 kernel: [] mdt_do_glimpse+0x1e9/0x4c0 [mdt] May 07 08:50:49 fir-md1-s1 kernel: [] mdt_glimpse_enqueue+0x3d3/0x4f0 [mdt] May 07 08:50:49 fir-md1-s1 kernel: [] mdt_intent_glimpse+0x1f/0x30 [mdt] May 07 08:50:50 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] May 07 08:50:50 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] May 07 08:50:50 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] May 07 08:50:50 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] May 07 08:50:50 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] May 07 08:50:50 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 07 08:50:50 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] May 07 08:50:50 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 May 07 08:50:50 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 May 07 08:50:50 fir-md1-s1 kernel: [] 0xffffffffffffffff May 07 08:50:59 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 2480fec0-b3bc-44e3-a047-4090437b4b7e (at 10.9.114.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985cda5a2c00, cur 1557244259 expire 1557244109 last 1557244032 May 07 08:50:59 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 08:51:11 fir-md1-s1 kernel: LNet: Service thread pid 102647 completed after 221.45s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). May 07 08:51:11 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message May 07 08:55:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client fbe55575-ffec-483a-9f4b-6c9765813e37 (at 10.9.112.1@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985c75ef2c00, cur 1557244518 expire 1557244368 last 1557244291 May 07 08:55:18 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 08:56:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.114.5@o2ib4) May 07 08:56:03 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages May 07 09:03:49 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 2ba93059-c680-b159-10d8-ab574e9bf585 (at 10.9.108.5@o2ib4) May 07 09:03:49 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 09:05:18 fir-md1-s1 kernel: Lustre: 102422:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557245111/real 1557245111] req@ffff982764bccb00 x1632372815658032/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557245118 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 09:05:18 fir-md1-s1 kernel: Lustre: 102422:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 18 previous similar messages May 07 09:05:26 fir-md1-s1 kernel: Lustre: 102568:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982b3d389500 x1632105324690384/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:1/0 lens 480/568 e 1 to 0 dl 1557245131 ref 2 fl Interpret:/0/0 rc 0/0 May 07 09:05:26 fir-md1-s1 kernel: Lustre: 102568:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 07 09:05:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 09:05:32 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 07 09:05:39 fir-md1-s1 kernel: Lustre: 102473:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557245132/real 1557245132] req@ffff982764bcdd00 x1632372815661424/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557245139 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 09:05:39 fir-md1-s1 kernel: Lustre: 102473:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 07 09:05:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 09:05:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client c9336415-f6c5-883d-24d5-6a752dd59401 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982985f04000, cur 1557245158 expire 1557245008 last 1557244931 May 07 09:05:58 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 09:14:29 fir-md1-s1 kernel: Lustre: 102504:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557245662/real 1557245662] req@ffff982764bcf800 x1632372967356624/t0(0) o106->fir-MDT0002@10.9.114.5@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1557245669 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 09:14:29 fir-md1-s1 kernel: Lustre: 102504:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 07 09:14:37 fir-md1-s1 kernel: Lustre: 102731:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982232005d00 x1632105339575328/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:12/0 lens 480/568 e 1 to 0 dl 1557245682 ref 2 fl Interpret:/0/0 rc 0/0 May 07 09:14:37 fir-md1-s1 kernel: Lustre: 102731:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 07 09:14:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 09:14:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 07 09:14:43 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages May 07 09:18:14 fir-md1-s1 kernel: Lustre: 102388:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557245887/real 1557245887] req@ffff982b3d7ea100 x1632373029747504/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557245894 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 09:18:14 fir-md1-s1 kernel: Lustre: 102388:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 16 previous similar messages May 07 09:18:32 fir-md1-s1 kernel: Lustre: 102713:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff982a68294b00 x1632105341346912/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:7/0 lens 480/568 e 0 to 0 dl 1557245917 ref 2 fl Interpret:/0/0 rc 0/0 May 07 09:18:32 fir-md1-s1 kernel: Lustre: 102713:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 07 09:18:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 08569354-3ed8-bb43-01f8-a20f1ea467bc (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9843c8f57800, cur 1557245915 expire 1557245765 last 1557245688 May 07 09:18:35 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages May 07 09:27:27 fir-md1-s1 kernel: Lustre: 102716:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9822d3d90000 x1632105345995936/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:2/0 lens 480/568 e 1 to 0 dl 1557246452 ref 2 fl Interpret:/0/0 rc 0/0 May 07 09:27:27 fir-md1-s1 kernel: Lustre: 102716:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 07 09:27:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 09:27:33 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 09:27:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 07 09:27:33 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages May 07 09:27:43 fir-md1-s1 kernel: Lustre: 102722:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557246432/real 1557246432] req@ffff982c77a18600 x1632373179187456/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557246463 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 09:27:43 fir-md1-s1 kernel: LustreError: 102504:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.23.14@o2ib6) returned error from glimpse AST (req@ffff9828f7b83900 x1632373179187568 status -107 rc -107), evict it ns: mdt-fir-MDT0002_UUID lock: ffff983ac4751440/0xce8853b3d7788684 lrc: 4/0,0 mode: PW/PW res: [0x2c00240e1:0x4:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x40200000000000 nid: 10.8.23.14@o2ib6 remote: 0xdc0398ca4733749f expref: 41 pid: 102579 timeout: 0 lvb_type: 0 May 07 09:27:43 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.23.14@o2ib6 was evicted due to a lock glimpse callback time out: rc -107 May 07 09:27:43 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 331s: evicting client at 10.8.23.14@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff983ac4751440/0xce8853b3d7788684 lrc: 4/0,0 mode: PW/PW res: [0x2c00240e1:0x4:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x40200000000000 nid: 10.8.23.14@o2ib6 remote: 0xdc0398ca4733749f expref: 42 pid: 102579 timeout: 0 lvb_type: 0 May 07 09:27:43 fir-md1-s1 kernel: Lustre: 102722:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 6 previous similar messages May 07 09:33:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 7ac77253-ddd7-7c91-beea-fd5dfde008b3 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9847d7fa3800, cur 1557246824 expire 1557246674 last 1557246597 May 07 09:33:44 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages May 07 09:43:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.114.5@o2ib4) May 07 09:43:07 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages May 07 09:50:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 0bba6657-cd54-9ad7-b9ba-120aef5c9cad (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983bedb97800, cur 1557247841 expire 1557247691 last 1557247614 May 07 09:50:41 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 10:00:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 07 10:00:09 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 10:13:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e0c7c099-87a1-d05e-fd38-39feb1f5c5a1 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9831f7f2b800, cur 1557249217 expire 1557249067 last 1557248990 May 07 10:13:37 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 10:13:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a9f925c3-c6cc-1827-f5ca-23aeee30bd63 (at 10.8.23.14@o2ib6) May 07 10:13:51 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages May 07 10:17:38 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client d358614b-4ed7-3580-49ee-f33a51ff11d7 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982a84206000, cur 1557249458 expire 1557249308 last 1557249231 May 07 10:17:38 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 10:24:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e8caa1e0-95d9-b020-e242-0b07ce07b35c (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982e60f19000, cur 1557249869 expire 1557249719 last 1557249642 May 07 10:24:29 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 10:24:49 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 07 10:24:49 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 10:42:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 1ef4bd81-95f3-029a-2025-6401b9a7961e (at 10.8.13.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9849bea46400, cur 1557250954 expire 1557250804 last 1557250727 May 07 10:42:34 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages May 07 10:44:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 07 10:44:24 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages May 07 10:54:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 40f9c724-6bc8-1588-d4af-684773732c28 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982ced222800, cur 1557251645 expire 1557251495 last 1557251418 May 07 10:54:05 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 10:54:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 07 10:54:27 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 10:59:14 fir-md1-s1 kernel: Lustre: 102450:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557251947/real 1557251947] req@ffff98471df84800 x1632374656783232/t0(0) o104->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557251954 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 10:59:22 fir-md1-s1 kernel: Lustre: 102692:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff984a3d69e000 x1631676468882640/t0(0) o36->fdc52cb1-2cc9-0d98-0f2b-dc082fc53acc@10.8.8.37@o2ib6:27/0 lens 512/448 e 1 to 0 dl 1557251967 ref 2 fl Interpret:/0/0 rc 0/0 May 07 10:59:22 fir-md1-s1 kernel: Lustre: 102692:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 07 10:59:22 fir-md1-s1 kernel: Lustre: 102566:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff98322debbf00 x1631610962035728/t0(0) o101->1c578c74-5128-6e3f-cdf7-83221a90bc4e@10.8.27.8@o2ib6:27/0 lens 576/3264 e 1 to 0 dl 1557251967 ref 2 fl Interpret:/0/0 rc 0/0 May 07 10:59:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client fdc52cb1-2cc9-0d98-0f2b-dc082fc53acc (at 10.8.8.37@o2ib6) reconnecting May 07 10:59:48 fir-md1-s1 kernel: Lustre: 102664:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff98245020c500 x1631559537922608/t0(0) o101->044042bf-dd57-7ee7-fd56-cb18003c928b@10.8.7.32@o2ib6:23/0 lens 576/3264 e 0 to 0 dl 1557251993 ref 2 fl Interpret:/0/0 rc 0/0 May 07 10:59:48 fir-md1-s1 kernel: Lustre: 102664:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 07 10:59:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client fdc52cb1-2cc9-0d98-0f2b-dc082fc53acc (at 10.8.8.37@o2ib6) reconnecting May 07 10:59:49 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 10:59:51 fir-md1-s1 kernel: Lustre: 102439:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff984673684b00 x1631550541124032/t0(0) o101->024f7538-830d-bf6f-afb4-1c31cea1bee4@10.8.8.5@o2ib6:26/0 lens 576/3264 e 0 to 0 dl 1557251996 ref 2 fl Interpret:/0/0 rc 0/0 May 07 10:59:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 044042bf-dd57-7ee7-fd56-cb18003c928b (at 10.8.7.32@o2ib6) reconnecting May 07 10:59:54 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 10:59:56 fir-md1-s1 kernel: Lustre: 102450:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557251989/real 1557251989] req@ffff98471df84800 x1632374656783232/t0(0) o104->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557251996 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 10:59:56 fir-md1-s1 kernel: Lustre: 102450:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 07 10:59:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client a9ab9d87-75a5-5e5b-3267-58f1c7df9600 (at 10.8.1.16@o2ib6) reconnecting May 07 10:59:58 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 07 11:00:10 fir-md1-s1 kernel: Lustre: 102379:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff983c55f88f00 x1631562467893040/t0(0) o101->2c0bfc93-71cb-f565-f1fb-8f804a23ec4c@10.8.1.26@o2ib6:15/0 lens 576/3264 e 0 to 0 dl 1557252015 ref 2 fl Interpret:/0/0 rc 0/0 May 07 11:00:10 fir-md1-s1 kernel: Lustre: 102379:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages May 07 11:00:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client fdc52cb1-2cc9-0d98-0f2b-dc082fc53acc (at 10.8.8.37@o2ib6) reconnecting May 07 11:00:10 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 07 11:00:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 30eece9e-b079-bed9-0d97-76b9b6aa4aa3 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9828f3e6e000, cur 1557252023 expire 1557251873 last 1557251796 May 07 11:00:23 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 11:00:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 024f7538-830d-bf6f-afb4-1c31cea1bee4 (at 10.8.8.5@o2ib6) reconnecting May 07 11:00:28 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages May 07 11:00:37 fir-md1-s1 kernel: LustreError: 102756:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557251947, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff983cd94e3600/0xce8853b4785dde25 lrc: 3/1,0 mode: --/PR res: [0x2000216f5:0x1cca:0x0].0x0 bits 0x13/0x0 rrc: 12 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 102756 timeout: 0 lvb_type: 0 May 07 11:00:37 fir-md1-s1 kernel: LustreError: 102756:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 36 previous similar messages May 07 11:00:42 fir-md1-s1 kernel: Lustre: 102722:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff9821ce244500 x1632105407288432/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:17/0 lens 480/568 e 0 to 0 dl 1557252047 ref 2 fl Interpret:/0/0 rc 0/0 May 07 11:00:53 fir-md1-s1 kernel: LustreError: 102431:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557251963, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff983a9c724a40/0xce8853b478d0131f lrc: 3/1,0 mode: --/PR res: [0x2000216f5:0x1cca:0x0].0x0 bits 0x13/0x0 rrc: 12 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 102431 timeout: 0 lvb_type: 0 May 07 11:00:53 fir-md1-s1 kernel: LustreError: 102431:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message May 07 11:00:59 fir-md1-s1 kernel: Lustre: 102363:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff982cc6ade900 x1631565610054720/t0(0) o101->7cc0f019-7fa6-17b1-76f1-8ecb3c84ba82@10.8.27.24@o2ib6:4/0 lens 576/3264 e 0 to 0 dl 1557252064 ref 2 fl Interpret:/0/0 rc 0/0 May 07 11:01:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 9f83e13f-6dc4-9163-be6d-ae55a9f62b03 (at 10.8.27.2@o2ib6) reconnecting May 07 11:01:02 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages May 07 11:01:13 fir-md1-s1 kernel: Lustre: 102450:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557252066/real 1557252066] req@ffff98471df84800 x1632374656783232/t0(0) o104->fir-MDT0000@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557252073 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 11:01:13 fir-md1-s1 kernel: Lustre: 102450:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 17 previous similar messages May 07 11:01:15 fir-md1-s1 kernel: LustreError: 102353:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557251985, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff982e1461ca40/0xce8853b4797ff43e lrc: 3/1,0 mode: --/PR res: [0x2000216f5:0x1cca:0x0].0x0 bits 0x13/0x0 rrc: 12 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 102353 timeout: 0 lvb_type: 0 May 07 11:01:15 fir-md1-s1 kernel: LustreError: 102353:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 4 previous similar messages May 07 11:01:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 06a346a7-13b1-aaef-624b-fa1575cee4b9 (at 10.8.27.23@o2ib6) in 174 seconds. I think it's dead, and I am evicting it. exp ffff98370472c400, cur 1557252099 expire 1557251949 last 1557251925 May 07 11:01:39 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 11:01:41 fir-md1-s1 kernel: LustreError: 102664:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.27.23@o2ib6) returned error from glimpse AST (req@ffff9822a0a69500 x1632374675349392 status -107 rc -107), evict it ns: mdt-fir-MDT0002_UUID lock: ffff985cce076540/0xce8853b475b24b9d lrc: 4/0,0 mode: PW/PW res: [0x2c001bded:0x21c5:0x0].0x0 bits 0x40/0x0 rrc: 10 type: IBT flags: 0x40200000000000 nid: 10.8.27.23@o2ib6 remote: 0x8b6f16ece1f910b8 expref: 47 pid: 102756 timeout: 0 lvb_type: 0 May 07 11:01:41 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.27.23@o2ib6 was evicted due to a lock glimpse callback time out: rc -107 May 07 11:01:41 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 244s: evicting client at 10.8.27.23@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff985cce076540/0xce8853b475b24b9d lrc: 4/0,0 mode: PW/PW res: [0x2c001bded:0x21c5:0x0].0x0 bits 0x40/0x0 rrc: 10 type: IBT flags: 0x40200000000000 nid: 10.8.27.23@o2ib6 remote: 0x8b6f16ece1f910b8 expref: 48 pid: 102756 timeout: 0 lvb_type: 0 May 07 11:02:20 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client a1f2bba0-eb8a-4f10-2394-dea954ba5f2d (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9851cae60400, cur 1557252140 expire 1557251990 last 1557251913 May 07 11:07:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 3f97eb9f-1a20-7179-6229-0e033571f763 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff981f21341c00, cur 1557252430 expire 1557252280 last 1557252203 May 07 11:08:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 07 11:08:02 fir-md1-s1 kernel: Lustre: Skipped 52 previous similar messages May 07 11:12:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 9134d6ed-2ca0-ddb8-9f7a-4d783ed8d98e (at 10.9.101.49@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984709670400, cur 1557252733 expire 1557252583 last 1557252506 May 07 11:12:13 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 11:15:18 fir-md1-s1 kernel: Lustre: 102400:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557252911/real 1557252911] req@ffff9825d3fd2a00 x1632374910249072/t0(0) o104->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557252918 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 11:15:18 fir-md1-s1 kernel: Lustre: 102400:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 8 previous similar messages May 07 11:15:26 fir-md1-s1 kernel: Lustre: 102473:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9821236b1e00 x1631573941647264/t0(0) o101->fff48c29-b405-5a73-271e-23103edf7e4a@10.9.107.70@o2ib4:1/0 lens 1792/3288 e 1 to 0 dl 1557252931 ref 2 fl Interpret:/0/0 rc 0/0 May 07 11:15:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client fff48c29-b405-5a73-271e-23103edf7e4a (at 10.9.107.70@o2ib4) reconnecting May 07 11:15:32 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages May 07 11:15:39 fir-md1-s1 kernel: Lustre: 102400:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557252932/real 1557252932] req@ffff9825d3fd2a00 x1632374910249072/t0(0) o104->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557252939 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 11:15:39 fir-md1-s1 kernel: Lustre: 102400:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 07 11:15:39 fir-md1-s1 kernel: Lustre: 101694:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff98591bb03300 x1631577577424208/t0(0) o101->7af8b8f9-ccc3-f933-faed-451a8cc75f3f@10.9.102.38@o2ib4:14/0 lens 584/3264 e 1 to 0 dl 1557252944 ref 2 fl Interpret:/0/0 rc 0/0 May 07 11:15:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 7af8b8f9-ccc3-f933-faed-451a8cc75f3f (at 10.9.102.38@o2ib4) reconnecting May 07 11:15:56 fir-md1-s1 kernel: Lustre: 103228:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff9851f8a3a700 x1631535964525312/t0(0) o101->2234877f-3fc8-f3e4-bcc7-41174e21aeca@10.9.102.46@o2ib4:1/0 lens 584/3264 e 0 to 0 dl 1557252961 ref 2 fl Interpret:/0/0 rc 0/0 May 07 11:15:56 fir-md1-s1 kernel: Lustre: 103228:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages May 07 11:16:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 2234877f-3fc8-f3e4-bcc7-41174e21aeca (at 10.9.102.46@o2ib4) reconnecting May 07 11:16:02 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages May 07 11:16:21 fir-md1-s1 kernel: Lustre: 102400:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557252974/real 1557252974] req@ffff9825d3fd2a00 x1632374910249072/t0(0) o104->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557252981 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 11:16:21 fir-md1-s1 kernel: Lustre: 102400:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages May 07 11:16:27 fir-md1-s1 kernel: Lustre: 102488:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff982b5161e600 x1632105416289392/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:2/0 lens 480/568 e 0 to 0 dl 1557252992 ref 2 fl Interpret:/0/0 rc 0/0 May 07 11:16:27 fir-md1-s1 kernel: Lustre: 102488:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 07 11:16:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client fff48c29-b405-5a73-271e-23103edf7e4a (at 10.9.107.70@o2ib4) reconnecting May 07 11:16:35 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages May 07 11:16:51 fir-md1-s1 kernel: LustreError: 102739:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557252921, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff984b9b7c8000/0xce8853b494c01bf8 lrc: 3/1,0 mode: --/PR res: [0x2c00128ef:0x16d04:0x0].0x0 bits 0x13/0x0 rrc: 76 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 102739 timeout: 0 lvb_type: 0 May 07 11:16:54 fir-md1-s1 kernel: LustreError: 102770:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557252924, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff985a3df9cc80/0xce8853b494d6fd3f lrc: 3/1,0 mode: --/PR res: [0x2c00128ef:0x16d04:0x0].0x0 bits 0x13/0x0 rrc: 76 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 102770 timeout: 0 lvb_type: 0 May 07 11:16:59 fir-md1-s1 kernel: LustreError: 103223:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557252929, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff985be1e75100/0xce8853b494fd4841 lrc: 3/1,0 mode: --/PR res: [0x2c00128ef:0x16d04:0x0].0x0 bits 0x13/0x0 rrc: 76 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 103223 timeout: 0 lvb_type: 0 May 07 11:17:00 fir-md1-s1 kernel: Lustre: 102572:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9821b2e1a100 x1631752466331216/t0(0) o101->d29027f7-b27d-0b30-df4a-7dfdbfe2d839@10.9.112.16@o2ib4:5/0 lens 584/3264 e 1 to 0 dl 1557253025 ref 2 fl Interpret:/0/0 rc 0/0 May 07 11:17:00 fir-md1-s1 kernel: Lustre: 102572:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages May 07 11:17:11 fir-md1-s1 kernel: LustreError: 102580:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557252941, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff983e51770000/0xce8853b49551d536 lrc: 3/1,0 mode: --/PR res: [0x2c00128ef:0x16d04:0x0].0x0 bits 0x13/0x0 rrc: 76 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 102580 timeout: 0 lvb_type: 0 May 07 11:17:11 fir-md1-s1 kernel: LustreError: 102580:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message May 07 11:17:38 fir-md1-s1 kernel: Lustre: 102400:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557253051/real 1557253051] req@ffff9825d3fd2a00 x1632374910249072/t0(0) o104->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557253058 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 11:17:38 fir-md1-s1 kernel: Lustre: 102400:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 29 previous similar messages May 07 11:17:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client a672d11f-a495-3615-fbb4-49f37049a724 (at 10.9.102.37@o2ib4) reconnecting May 07 11:17:40 fir-md1-s1 kernel: Lustre: Skipped 44 previous similar messages May 07 11:17:45 fir-md1-s1 kernel: LustreError: 102400:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.27.23@o2ib6) failed to reply to blocking AST (req@ffff9825d3fd2a00 x1632374910249072 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff983806b2af40/0xce8853b4928d3f36 lrc: 4/0,0 mode: PR/PR res: [0x2c00128ef:0x16d04:0x0].0x0 bits 0x13/0x0 rrc: 76 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0x7351830ab81ddae8 expref: 150 pid: 102544 timeout: 778479 lvb_type: 0 May 07 11:17:45 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.27.23@o2ib6 was evicted due to a lock blocking callback time out: rc -110 May 07 11:17:45 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.27.23@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff983806b2af40/0xce8853b4928d3f36 lrc: 3/0,0 mode: PR/PR res: [0x2c00128ef:0x16d04:0x0].0x0 bits 0x13/0x0 rrc: 76 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0x7351830ab81ddae8 expref: 151 pid: 102544 timeout: 0 lvb_type: 0 May 07 11:17:45 fir-md1-s1 kernel: Lustre: 102580:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (123:1s); client may timeout. req@ffff984416fb1500 x1631537679706432/t0(0) o101->214bcacf-deef-8b1a-7220-98313adef1de@10.9.102.36@o2ib4:11/0 lens 584/536 e 0 to 0 dl 1557253064 ref 1 fl Complete:/0/0 rc 0/0 May 07 11:17:55 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 214ab3cf-ecde-1864-c03d-fa44936a19c8 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983e1ebbb000, cur 1557253075 expire 1557252925 last 1557252848 May 07 11:17:55 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 11:18:06 fir-md1-s1 kernel: Lustre: 102731:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (123:1s); client may timeout. req@ffff982b1df70900 x1632105416289344/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:2/0 lens 480/536 e 0 to 0 dl 1557253085 ref 1 fl Complete:/0/0 rc 301/301 May 07 11:18:06 fir-md1-s1 kernel: Lustre: 102731:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message May 07 11:18:10 fir-md1-s1 kernel: Lustre: 102554:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff983fb3f36c50 x1631535692776496/t0(0) o101->16842ad7-a917-91d7-fc24-3de623e00ea2@10.9.105.59@o2ib4:15/0 lens 584/3264 e 0 to 0 dl 1557253095 ref 2 fl Interpret:/0/0 rc 0/0 May 07 11:18:10 fir-md1-s1 kernel: Lustre: 102554:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 17 previous similar messages May 07 11:18:15 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.9.102.36@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff983e51770000/0xce8853b49551d536 lrc: 3/0,0 mode: PR/PR res: [0x2c00128ef:0x16d04:0x0].0x0 bits 0x13/0x0 rrc: 86 type: IBT flags: 0x60200400000020 nid: 10.9.102.36@o2ib4 remote: 0x4e4ca91ad11949d expref: 178 pid: 102580 timeout: 778366 lvb_type: 0 May 07 11:18:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to a19fbd52-fc1f-6afe-5025-88bbd6370298 (at 10.9.102.36@o2ib4) May 07 11:18:16 fir-md1-s1 kernel: Lustre: Skipped 70 previous similar messages May 07 11:19:03 fir-md1-s1 kernel: LustreError: 102533:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.18.14@o2ib6) failed to reply to blocking AST (req@ffff9842e1fcad00 x1632374964244416 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff98203d2a2f40/0xce8853b498abc3ae lrc: 4/0,0 mode: PR/PR res: [0x2c00128ef:0x16c24:0x0].0x0 bits 0x13/0x0 rrc: 61 type: IBT flags: 0x60200400000020 nid: 10.8.18.14@o2ib6 remote: 0x5567093ff1a8c9a2 expref: 106 pid: 102605 timeout: 778437 lvb_type: 0 May 07 11:19:03 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.18.14@o2ib6 was evicted due to a lock blocking callback time out: rc -110 May 07 11:19:03 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 35s: evicting client at 10.8.18.14@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff98203d2a2f40/0xce8853b498abc3ae lrc: 3/0,0 mode: PR/PR res: [0x2c00128ef:0x16c24:0x0].0x0 bits 0x13/0x0 rrc: 61 type: IBT flags: 0x60200400000020 nid: 10.8.18.14@o2ib6 remote: 0x5567093ff1a8c9a2 expref: 107 pid: 102605 timeout: 0 lvb_type: 0 May 07 11:19:21 fir-md1-s1 kernel: LustreError: 102514:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.7.33@o2ib6) failed to reply to blocking AST (req@ffff98295675f800 x1632374969322928 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff981ccc6b7bc0/0xce8853b498abc27a lrc: 4/0,0 mode: PR/PR res: [0x2c00128ef:0x16ba0:0x0].0x0 bits 0x13/0x0 rrc: 51 type: IBT flags: 0x60200400000020 nid: 10.8.7.33@o2ib6 remote: 0x24c49ba87fa0fcd5 expref: 101 pid: 102723 timeout: 778455 lvb_type: 0 May 07 11:19:21 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.7.33@o2ib6 was evicted due to a lock blocking callback time out: rc -110 May 07 11:19:21 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 35s: evicting client at 10.8.7.33@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff981ccc6b7bc0/0xce8853b498abc27a lrc: 3/0,0 mode: PR/PR res: [0x2c00128ef:0x16ba0:0x0].0x0 bits 0x13/0x0 rrc: 51 type: IBT flags: 0x60200400000020 nid: 10.8.7.33@o2ib6 remote: 0x24c49ba87fa0fcd5 expref: 102 pid: 102723 timeout: 0 lvb_type: 0 May 07 11:20:16 fir-md1-s1 kernel: Lustre: 102379:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557253209/real 1557253209] req@ffff983c94bb1200 x1632374992269584/t0(0) o106->fir-MDT0002@10.8.18.18@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557253216 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 11:20:16 fir-md1-s1 kernel: Lustre: 102379:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 28 previous similar messages May 07 11:20:34 fir-md1-s1 kernel: Lustre: 102432:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff982eb6e3b300 x1631547621269264/t0(0) o101->9a46a636-d807-725a-1806-a4c05a6a1620@10.8.18.24@o2ib6:9/0 lens 480/568 e 0 to 0 dl 1557253239 ref 2 fl Interpret:/0/0 rc 0/0 May 07 11:20:34 fir-md1-s1 kernel: Lustre: 102432:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 28 previous similar messages May 07 11:20:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 9a46a636-d807-725a-1806-a4c05a6a1620 (at 10.8.18.24@o2ib6) reconnecting May 07 11:20:40 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages May 07 11:21:05 fir-md1-s1 kernel: Lustre: 102514:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:1s); client may timeout. req@ffff9823637e9200 x1631564452919616/t0(0) o101->442dd3b5-503d-fa23-0886-f83a3c7ec479@10.8.18.5@o2ib6:4/0 lens 480/536 e 0 to 0 dl 1557253264 ref 1 fl Complete:/0/0 rc 301/301 May 07 11:23:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client cea7a43b-aa43-8a60-b35d-3d2743c001ac (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982270e0f400, cur 1557253413 expire 1557253263 last 1557253186 May 07 11:23:33 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages May 07 11:31:50 fir-md1-s1 kernel: Lustre: 102598:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557253903/real 1557253903] req@ffff9842e1fcc200 x1632375171062720/t0(0) o106->fir-MDT0000@10.8.18.16@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557253910 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 11:31:50 fir-md1-s1 kernel: Lustre: 102598:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 10 previous similar messages May 07 11:31:58 fir-md1-s1 kernel: Lustre: 102389:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff984845b3c800 x1631560552485072/t0(0) o101->cf596cc9-7297-c8cd-7acc-7023bcdbec89@10.8.18.23@o2ib6:3/0 lens 480/568 e 1 to 0 dl 1557253923 ref 2 fl Interpret:/0/0 rc 0/0 May 07 11:31:58 fir-md1-s1 kernel: Lustre: 102389:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 07 11:32:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client cf596cc9-7297-c8cd-7acc-7023bcdbec89 (at 10.8.18.23@o2ib6) reconnecting May 07 11:32:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.8.18.23@o2ib6) May 07 11:32:04 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages May 07 11:33:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f4f411cb-15ce-1f07-9f23-424f6c4aa426 (at 10.8.27.23@o2ib6) in 186 seconds. I think it's dead, and I am evicting it. exp ffff981e5d272c00, cur 1557254014 expire 1557253864 last 1557253828 May 07 11:33:34 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 11:34:50 fir-md1-s1 kernel: LustreError: 102487:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff98374761b000 x1632375220416704/t0(0) o104->fir-MDT0000@10.8.18.16@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 May 07 11:34:51 fir-md1-s1 kernel: LustreError: 102598:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff9847eb6de300 x1632375220777312/t0(0) o104->fir-MDT0000@10.8.18.16@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 May 07 11:42:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a9f925c3-c6cc-1827-f5ca-23aeee30bd63 (at 10.8.23.14@o2ib6) May 07 11:42:48 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages May 07 11:43:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e8fb0979-89ce-005c-84f4-115c1a5d2ebe (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983856711800, cur 1557254630 expire 1557254480 last 1557254403 May 07 11:43:50 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages May 07 11:56:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 4bf91ef2-656c-56f2-bc8d-6317a85073ff (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982b6a248400, cur 1557255372 expire 1557255222 last 1557255145 May 07 11:56:12 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages May 07 11:59:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a9f925c3-c6cc-1827-f5ca-23aeee30bd63 (at 10.8.23.14@o2ib6) May 07 11:59:24 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages May 07 12:08:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 02bcd367-45c9-f294-deda-08045d8722c3 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983cbd5c7400, cur 1557256138 expire 1557255988 last 1557255911 May 07 12:08:58 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 12:10:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.109.7@o2ib4) May 07 12:10:23 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages May 07 12:21:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 264a8fc1-0463-2875-ec85-6c3245c570a8 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982e1ffe6c00, cur 1557256896 expire 1557256746 last 1557256669 May 07 12:21:36 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 12:21:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a9f925c3-c6cc-1827-f5ca-23aeee30bd63 (at 10.8.23.14@o2ib6) May 07 12:21:55 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages May 07 12:32:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d8583d7f-ba8f-1675-e5a4-3f69f2d6da58 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9854713c1400, cur 1557257546 expire 1557257396 last 1557257319 May 07 12:32:26 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 12:32:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a9f925c3-c6cc-1827-f5ca-23aeee30bd63 (at 10.8.23.14@o2ib6) May 07 12:32:59 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 12:35:44 fir-md1-s1 kernel: Lustre: 102400:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557257737/real 1557257737] req@ffff982b4805d100 x1632375969304336/t0(0) o106->fir-MDT0000@10.8.10.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557257744 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 12:35:44 fir-md1-s1 kernel: Lustre: 102400:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 37 previous similar messages May 07 12:35:52 fir-md1-s1 kernel: Lustre: 102651:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982742a87800 x1631544203787824/t0(0) o101->2e1837bb-385a-af64-a5d1-7a58230af8b2@10.9.0.64@o2ib4:27/0 lens 480/568 e 1 to 0 dl 1557257757 ref 2 fl Interpret:/0/0 rc 0/0 May 07 12:35:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 2e1837bb-385a-af64-a5d1-7a58230af8b2 (at 10.9.0.64@o2ib4) reconnecting May 07 12:35:58 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages May 07 12:37:00 fir-md1-s1 kernel: Lustre: 102647:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557257813/real 1557257813] req@ffff982c24431e00 x1632375970830080/t0(0) o106->fir-MDT0000@10.8.10.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557257820 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 12:37:00 fir-md1-s1 kernel: Lustre: 102647:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 20 previous similar messages May 07 12:43:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 07 12:43:36 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages May 07 12:44:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client ab8fc5bf-6f10-4c83-559e-3f6fc2255fd7 (at 10.9.114.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982221d80c00, cur 1557258271 expire 1557258121 last 1557258044 May 07 12:44:31 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages May 07 12:45:26 fir-md1-s1 kernel: Lustre: 102483:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557258319/real 1557258319] req@ffff982723ad5100 x1632376123289728/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557258326 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 12:45:26 fir-md1-s1 kernel: Lustre: 102483:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 8 previous similar messages May 07 12:45:44 fir-md1-s1 kernel: Lustre: 102744:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff982004bfe850 x1632105562970576/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:19/0 lens 480/568 e 0 to 0 dl 1557258349 ref 2 fl Interpret:/0/0 rc 0/0 May 07 12:45:44 fir-md1-s1 kernel: Lustre: 102744:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 07 12:45:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 12:45:50 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 12:52:49 fir-md1-s1 kernel: LustreError: 102605:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.26.4@o2ib6) returned error from blocking AST (req@ffff982dd2faef00 x1632376246659840 status -107 rc -107), evict it ns: mdt-fir-MDT0002_UUID lock: ffff9847ac2ff2c0/0xce8853b536207003 lrc: 4/0,0 mode: PR/PR res: [0x2c00128ef:0x16e58:0x0].0x0 bits 0x13/0x0 rrc: 42 type: IBT flags: 0x60200400000020 nid: 10.8.26.4@o2ib6 remote: 0x5191cef28879d5f7 expref: 163 pid: 102658 timeout: 784190 lvb_type: 0 May 07 12:52:49 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.26.4@o2ib6 was evicted due to a lock blocking callback time out: rc -107 May 07 12:52:49 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 0s: evicting client at 10.8.26.4@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff9847ac2ff2c0/0xce8853b536207003 lrc: 3/0,0 mode: PR/PR res: [0x2c00128ef:0x16e58:0x0].0x0 bits 0x13/0x0 rrc: 40 type: IBT flags: 0x60200400000020 nid: 10.8.26.4@o2ib6 remote: 0x5191cef28879d5f7 expref: 164 pid: 102658 timeout: 0 lvb_type: 0 May 07 12:56:59 fir-md1-s1 kernel: Lustre: 102548:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff9823e2317b00 x1632105591779504/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:4/0 lens 480/568 e 0 to 0 dl 1557259024 ref 2 fl Interpret:/0/0 rc 0/0 May 07 12:56:59 fir-md1-s1 kernel: Lustre: 102548:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 7 previous similar messages May 07 12:57:05 fir-md1-s1 kernel: Lustre: 102456:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557258994/real 1557258994] req@ffff98253cf4ec00 x1632376304376848/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557259025 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 12:57:05 fir-md1-s1 kernel: Lustre: 102456:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 41 previous similar messages May 07 12:57:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 12:57:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 07 12:57:05 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages May 07 12:58:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 4ff37796-c024-485d-10eb-91250bc869b7 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9820eb37c000, cur 1557259099 expire 1557258949 last 1557258872 May 07 12:58:19 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages May 07 13:03:19 fir-md1-s1 kernel: Lustre: 102739:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9820352f6000 x1631534820030576/t0(0) o101->8a2e0e99-b7e2-2b0e-9dbb-18a669bd784a@10.9.105.55@o2ib4:24/0 lens 584/3264 e 1 to 0 dl 1557259404 ref 2 fl Interpret:/0/0 rc 0/0 May 07 13:03:19 fir-md1-s1 kernel: Lustre: 102739:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 07 13:03:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6ea99810-c4ef-751c-68b4-b60bb649210c (at 10.8.8.35@o2ib6) reconnecting May 07 13:03:25 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages May 07 13:09:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.114.5@o2ib4) May 07 13:09:55 fir-md1-s1 kernel: Lustre: Skipped 108 previous similar messages May 07 13:22:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 07 13:22:00 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 13:22:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 3695ec97-1e33-a94a-7a18-4e8ccb31b79d (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984ac6e40000, cur 1557260576 expire 1557260426 last 1557260349 May 07 13:22:56 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages May 07 13:33:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f208ebb8-4376-4043-6760-d7bbad764cbd (at 10.9.101.43@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9848d1a6c400, cur 1557261239 expire 1557261089 last 1557261012 May 07 13:33:59 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 13:50:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f2044964-9abb-674f-9c85-4840b63d6662 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9838cdea6800, cur 1557262227 expire 1557262077 last 1557262000 May 07 13:50:27 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 13:51:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 07 13:51:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 14:01:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 5ed96454-6df8-835b-3f90-deee2b226859 (at 10.9.101.43@o2ib4) May 07 14:01:33 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 14:03:19 fir-md1-s1 kernel: Lustre: 102597:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 07 14:08:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 33e4d383-afe4-7850-9e31-74f1d3fe3003 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982a94f1ac00, cur 1557263324 expire 1557263174 last 1557263097 May 07 14:08:44 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 14:09:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 07 14:09:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 14:10:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 38315ffb-5342-ed63-c1e3-0b2c0fe96b7c (at 10.8.10.29@o2ib6) in 211 seconds. I think it's dead, and I am evicting it. exp ffff9830ea347c00, cur 1557263400 expire 1557263250 last 1557263189 May 07 14:10:00 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 14:10:01 fir-md1-s1 kernel: Lustre: 102427:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557263394/real 1557263394] req@ffff981e7b659200 x1632377475198352/t0(0) o106->fir-MDT0000@10.9.114.5@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1557263401 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 14:10:01 fir-md1-s1 kernel: Lustre: 102427:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 16 previous similar messages May 07 14:10:09 fir-md1-s1 kernel: Lustre: 102420:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff981ce9ac2100 x1632105827734576/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:14/0 lens 480/568 e 1 to 0 dl 1557263414 ref 2 fl Interpret:/0/0 rc 0/0 May 07 14:10:09 fir-md1-s1 kernel: Lustre: 102420:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 32 previous similar messages May 07 14:10:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 14:10:15 fir-md1-s1 kernel: Lustre: Skipped 98 previous similar messages May 07 14:10:16 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 6faadd98-d7dc-b7ed-eb72-86f6a0af37e9 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984b2264b000, cur 1557263416 expire 1557263266 last 1557263189 May 07 14:10:16 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 07 14:11:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d959b386-a70e-7a16-dcb7-b27fd3011742 (at 10.9.114.5@o2ib4) in 184 seconds. I think it's dead, and I am evicting it. exp ffff9848de742000, cur 1557263476 expire 1557263326 last 1557263292 May 07 14:14:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.10.29@o2ib6) May 07 14:14:55 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 14:17:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 24f54b66-220b-e07b-5df6-a50a2f3914f4 (at 10.8.24.5@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98480ef5e400, cur 1557263835 expire 1557263685 last 1557263608 May 07 14:17:15 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 14:23:05 fir-md1-s1 kernel: Lustre: 102562:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557264178/real 1557264178] req@ffff981ee7609800 x1632377705116896/t0(0) o106->fir-MDT0002@10.8.10.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557264185 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 14:23:05 fir-md1-s1 kernel: Lustre: 102562:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 11 previous similar messages May 07 14:23:13 fir-md1-s1 kernel: Lustre: 102488:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9826b5a09200 x1632105852849840/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:18/0 lens 480/568 e 1 to 0 dl 1557264198 ref 2 fl Interpret:/0/0 rc 0/0 May 07 14:23:19 fir-md1-s1 kernel: Lustre: 102647:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557264192/real 1557264192] req@ffff982337bf5400 x1632377705117424/t0(0) o106->fir-MDT0002@10.8.10.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557264199 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 14:23:19 fir-md1-s1 kernel: Lustre: 102647:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 07 14:23:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 14:23:19 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 14:23:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 31568f88-a5da-6bea-eb69-146c5b1b856e (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984c72267c00, cur 1557264201 expire 1557264051 last 1557263974 May 07 14:23:21 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 14:28:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.10.29@o2ib6) May 07 14:28:11 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages May 07 14:35:13 fir-md1-s1 kernel: Lustre: 101750:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557264906/real 1557264906] req@ffff98298f00b000 x1632377914923664/t0(0) o106->fir-MDT0002@10.8.10.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557264913 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 14:35:20 fir-md1-s1 kernel: Lustre: 101750:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557264913/real 1557264913] req@ffff98298f00b000 x1632377914923664/t0(0) o106->fir-MDT0002@10.8.10.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557264920 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 14:35:20 fir-md1-s1 kernel: Lustre: 102548:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557264913/real 1557264913] req@ffff98297a60c200 x1632377914924368/t0(0) o106->fir-MDT0002@10.8.10.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557264920 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 14:35:20 fir-md1-s1 kernel: Lustre: 102548:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 07 14:35:21 fir-md1-s1 kernel: Lustre: 102651:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982c2af63000 x1632105900393168/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:26/0 lens 480/568 e 1 to 0 dl 1557264926 ref 2 fl Interpret:/0/0 rc 0/0 May 07 14:35:21 fir-md1-s1 kernel: Lustre: 102651:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 07 14:35:27 fir-md1-s1 kernel: Lustre: 102548:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557264920/real 1557264920] req@ffff98297a60c200 x1632377914924368/t0(0) o106->fir-MDT0002@10.8.10.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557264927 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 14:35:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 14:35:41 fir-md1-s1 kernel: Lustre: 102548:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557264934/real 1557264934] req@ffff98297a60c200 x1632377914924368/t0(0) o106->fir-MDT0002@10.8.10.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557264941 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 14:35:41 fir-md1-s1 kernel: Lustre: 102548:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages May 07 14:35:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 14:36:02 fir-md1-s1 kernel: Lustre: 102548:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557264955/real 1557264955] req@ffff98297a60c200 x1632377914924368/t0(0) o106->fir-MDT0002@10.8.10.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557264962 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 14:36:02 fir-md1-s1 kernel: Lustre: 101750:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557264955/real 1557264955] req@ffff98298f00b000 x1632377914923664/t0(0) o106->fir-MDT0002@10.8.10.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557264962 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 14:36:02 fir-md1-s1 kernel: Lustre: 101750:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 07 14:36:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 14:36:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 5960584d-8bd1-90dd-6045-b7e6792172b7 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff981cdfcf3000, cur 1557264972 expire 1557264822 last 1557264745 May 07 14:36:12 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 14:39:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.10.29@o2ib6) May 07 14:39:29 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 14:42:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 926e7156-e3ea-d21b-5570-976ffcc525b8 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982a66cbf000, cur 1557265351 expire 1557265201 last 1557265124 May 07 14:42:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 14:45:58 fir-md1-s1 kernel: Lustre: 102514:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557265551/real 1557265551] req@ffff9822e9a02100 x1632378092247808/t0(0) o106->fir-MDT0002@10.8.10.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557265558 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 14:45:58 fir-md1-s1 kernel: Lustre: 102546:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557265551/real 1557265551] req@ffff982ad0f6e300 x1632378092248032/t0(0) o106->fir-MDT0002@10.8.10.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557265558 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 14:45:58 fir-md1-s1 kernel: Lustre: 102546:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 07 14:46:06 fir-md1-s1 kernel: Lustre: 102667:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff98295a31b600 x1632105928627680/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:11/0 lens 480/568 e 1 to 0 dl 1557265571 ref 2 fl Interpret:/0/0 rc 0/0 May 07 14:46:06 fir-md1-s1 kernel: Lustre: 102667:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 07 14:46:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 14:46:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 14:46:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e5c6e418-9402-b4c8-b17f-405046b45b54 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9845446ef400, cur 1557265599 expire 1557265449 last 1557265372 May 07 14:46:39 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 14:49:32 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.10.29@o2ib6) May 07 14:49:32 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages May 07 14:50:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 6164810e-2f17-0da4-b2de-82ab2fa31076 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983e77eb6000, cur 1557265802 expire 1557265652 last 1557265575 May 07 14:50:02 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 14:56:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 96d0d7ea-f6f4-656c-8e7f-84d0f2fccdc9 (at 10.9.108.24@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985cfa684400, cur 1557266167 expire 1557266017 last 1557265940 May 07 14:56:07 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 15:01:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to be965067-fc27-311c-26c4-b94a4613c91e (at 10.9.108.24@o2ib4) May 07 15:01:06 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 15:07:54 fir-md1-s1 kernel: Lustre: 102363:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557266867/real 1557266867] req@ffff98218ee9c500 x1632378445325568/t0(0) o106->fir-MDT0002@10.8.10.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557266874 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 15:07:54 fir-md1-s1 kernel: Lustre: 102363:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 10 previous similar messages May 07 15:08:02 fir-md1-s1 kernel: Lustre: 102431:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982c9c7a3900 x1632105950737952/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:7/0 lens 480/568 e 1 to 0 dl 1557266887 ref 2 fl Interpret:/0/0 rc 0/0 May 07 15:08:02 fir-md1-s1 kernel: Lustre: 102431:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 07 15:08:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 15:08:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 15:08:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client dfc30246-ecbd-f024-7bb0-28e603f81590 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98324f610400, cur 1557266913 expire 1557266763 last 1557266686 May 07 15:08:33 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 15:12:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.10.29@o2ib6) May 07 15:12:45 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages May 07 15:15:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b8c209bc-ddc2-5ae2-ef34-ca529f4f7274 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983edfba0800, cur 1557267330 expire 1557267180 last 1557267103 May 07 15:15:30 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 15:23:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 07 15:23:15 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 15:27:34 fir-md1-s1 kernel: Lustre: 102713:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557268047/real 1557268047] req@ffff9828efeca400 x1632378781177952/t0(0) o106->fir-MDT0002@10.8.10.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557268054 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 15:27:34 fir-md1-s1 kernel: Lustre: 102713:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 07 15:27:41 fir-md1-s1 kernel: Lustre: 102713:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557268054/real 1557268054] req@ffff9828efeca400 x1632378781177952/t0(0) o106->fir-MDT0002@10.8.10.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557268061 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 15:27:42 fir-md1-s1 kernel: Lustre: 102446:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982077a4a700 x1632105975482992/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:17/0 lens 480/568 e 1 to 0 dl 1557268067 ref 2 fl Interpret:/0/0 rc 0/0 May 07 15:27:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 15:27:55 fir-md1-s1 kernel: Lustre: 102713:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557268068/real 1557268068] req@ffff9828efeca400 x1632378781177952/t0(0) o106->fir-MDT0002@10.8.10.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557268075 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 15:27:55 fir-md1-s1 kernel: Lustre: 102713:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 07 15:28:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 15:28:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 4f271412-c3e2-b3b8-adf4-b291b150b7ab (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9833b06b1000, cur 1557268094 expire 1557267944 last 1557267867 May 07 15:28:14 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages May 07 15:33:25 fir-md1-s1 kernel: Lustre: 102435:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557268398/real 1557268398] req@ffff9828ab6cd100 x1632378878908448/t0(0) o106->fir-MDT0002@10.9.109.27@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1557268405 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 15:33:25 fir-md1-s1 kernel: Lustre: 102435:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 07 15:33:43 fir-md1-s1 kernel: Lustre: 102722:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff9820456eec00 x1632105980896576/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:18/0 lens 480/568 e 0 to 0 dl 1557268428 ref 2 fl Interpret:/0/0 rc 0/0 May 07 15:33:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 15:33:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 07 15:33:49 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages May 07 15:34:07 fir-md1-s1 kernel: Lustre: 102435:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557268440/real 1557268440] req@ffff9828ab6cd100 x1632378878908448/t0(0) o106->fir-MDT0002@10.9.109.27@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1557268447 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 15:34:07 fir-md1-s1 kernel: Lustre: 102435:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 07 15:34:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 15:34:38 fir-md1-s1 kernel: Lustre: 102709:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9845f0610600 x1631559623103184/t0(0) o101->d5482dec-dd94-00f8-737a-5b6b97429b46@10.9.106.50@o2ib4:13/0 lens 1800/3288 e 1 to 0 dl 1557268483 ref 2 fl Interpret:/0/0 rc 0/0 May 07 15:34:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client d5482dec-dd94-00f8-737a-5b6b97429b46 (at 10.9.106.50@o2ib4) reconnecting May 07 15:34:58 fir-md1-s1 kernel: Lustre: 102479:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff98223d72da00 x1631559404838096/t0(0) o101->dacb83f0-b432-ea21-cf1b-fb1ac63fd0b0@10.9.101.62@o2ib4:3/0 lens 576/3264 e 0 to 0 dl 1557268503 ref 2 fl Interpret:/0/0 rc 0/0 May 07 15:35:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client dacb83f0-b432-ea21-cf1b-fb1ac63fd0b0 (at 10.9.101.62@o2ib4) reconnecting May 07 15:35:04 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 07 15:35:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 15:35:23 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 07 15:35:24 fir-md1-s1 kernel: Lustre: 102435:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557268517/real 1557268517] req@ffff9828ab6cd100 x1632378878908448/t0(0) o106->fir-MDT0002@10.9.109.27@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1557268524 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 15:35:24 fir-md1-s1 kernel: Lustre: 102435:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 30 previous similar messages May 07 15:35:29 fir-md1-s1 kernel: Lustre: 102565:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff981d19a98300 x1631585291395072/t0(0) o101->a672d11f-a495-3615-fbb4-49f37049a724@10.9.102.37@o2ib4:4/0 lens 480/568 e 1 to 0 dl 1557268534 ref 2 fl Interpret:/0/0 rc 0/0 May 07 15:35:39 fir-md1-s1 kernel: Lustre: 102712:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff983b272de600 x1631577579800240/t0(0) o101->7af8b8f9-ccc3-f933-faed-451a8cc75f3f@10.9.102.38@o2ib4:14/0 lens 480/568 e 0 to 0 dl 1557268544 ref 2 fl Interpret:/0/0 rc 0/0 May 07 15:35:39 fir-md1-s1 kernel: Lustre: 102712:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages May 07 15:35:56 fir-md1-s1 kernel: Lustre: 102658:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff983e0bae0c00 x1631559806953376/t0(0) o101->02dfd968-e7b1-52cc-0db8-aa0d10c0832c@10.9.102.19@o2ib4:1/0 lens 480/568 e 0 to 0 dl 1557268561 ref 2 fl Interpret:/0/0 rc 0/0 May 07 15:35:56 fir-md1-s1 kernel: Lustre: 102658:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 9 previous similar messages May 07 15:35:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 2e1837bb-385a-af64-a5d1-7a58230af8b2 (at 10.9.0.64@o2ib4) reconnecting May 07 15:35:56 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages May 07 15:36:03 fir-md1-s1 kernel: LustreError: 102504:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557268473, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff98595ee71440/0xce8853b639bac92e lrc: 3/1,0 mode: --/PR res: [0x2c001c0e4:0xdf7f:0x0].0x0 bits 0x13/0x0 rrc: 13 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 102504 timeout: 0 lvb_type: 0 May 07 15:36:30 fir-md1-s1 kernel: Lustre: 102712:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff98321d37bf00 x1631798623927232/t0(0) o36->32b547fb-9ba2-3fc7-59d4-2c94ddbea8d3@10.9.107.72@o2ib4:5/0 lens 520/2888 e 0 to 0 dl 1557268595 ref 2 fl Interpret:/0/0 rc 0/0 May 07 15:36:30 fir-md1-s1 kernel: Lustre: 102712:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 07 15:42:25 fir-md1-s1 kernel: Lustre: 102435:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557268938/real 1557268938] req@ffff982077a03000 x1632379025904032/t0(0) o106->fir-MDT0002@10.8.10.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557268945 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 15:42:25 fir-md1-s1 kernel: Lustre: 102435:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 194 previous similar messages May 07 15:42:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 1a5834f3-c40d-8140-1495-3283ea07b4f3 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98319cf25c00, cur 1557268959 expire 1557268809 last 1557268732 May 07 15:42:39 fir-md1-s1 kernel: Lustre: Skipped 101 previous similar messages May 07 15:47:14 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.10.29@o2ib6) May 07 15:47:14 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages May 07 16:03:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e6bce195-8b75-8100-43dd-92ae3b1a9171 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9824010e6c00, cur 1557270211 expire 1557270061 last 1557269984 May 07 16:03:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 16:04:08 fir-md1-s1 kernel: Lustre: 102488:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557270241/real 1557270241] req@ffff9828efe50f00 x1632379392127712/t0(0) o106->fir-MDT0002@10.8.10.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557270248 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 16:04:08 fir-md1-s1 kernel: Lustre: 102488:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 07 16:04:17 fir-md1-s1 kernel: Lustre: 102564:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9825cc2a4b00 x1632106027745008/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:21/0 lens 480/568 e 1 to 0 dl 1557270261 ref 2 fl Interpret:/0/0 rc 0/0 May 07 16:04:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d605442b-0f1f-31ad-7155-f8056f3fdc77 (at 10.8.26.4@o2ib6) May 07 16:04:19 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 16:04:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 16:04:22 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages May 07 16:04:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 16:04:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client dda09422-c907-407d-036d-a3a6d44772c2 (at 10.8.10.29@o2ib6) in 224 seconds. I think it's dead, and I am evicting it. exp ffff983b91d5ac00, cur 1557270287 expire 1557270137 last 1557270063 May 07 16:04:47 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 16:10:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d13425d2-f7f9-4d3b-4526-236128461cab (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983cddb4e800, cur 1557270614 expire 1557270464 last 1557270387 May 07 16:10:14 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 16:15:44 fir-md1-s1 kernel: Lustre: 102456:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557270937/real 1557270937] req@ffff98283b306600 x1632379589193280/t0(0) o106->fir-MDT0002@10.8.10.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557270944 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 16:15:44 fir-md1-s1 kernel: Lustre: 102456:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 07 16:15:52 fir-md1-s1 kernel: Lustre: 102456:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557270944/real 1557270944] req@ffff98283b306600 x1632379589193280/t0(0) o106->fir-MDT0002@10.8.10.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557270951 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 16:15:53 fir-md1-s1 kernel: Lustre: 102739:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff98297a65ef00 x1632106040463296/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:27/0 lens 480/568 e 1 to 0 dl 1557270957 ref 2 fl Interpret:/0/0 rc 0/0 May 07 16:15:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 16:15:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 07 16:15:58 fir-md1-s1 kernel: Lustre: Skipped 106 previous similar messages May 07 16:16:06 fir-md1-s1 kernel: Lustre: 102456:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557270959/real 1557270959] req@ffff98283b306600 x1632379589193280/t0(0) o106->fir-MDT0002@10.8.10.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557270966 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 16:16:06 fir-md1-s1 kernel: Lustre: 102456:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 07 16:16:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f10e4c66-3318-da71-f923-7064d0e55a01 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff981e81a14000, cur 1557270978 expire 1557270828 last 1557270751 May 07 16:16:18 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 16:32:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e8e31406-a81e-9855-5960-72750e6f5a6f (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9830ed741400, cur 1557271958 expire 1557271808 last 1557271731 May 07 16:32:38 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages May 07 16:35:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.10.29@o2ib6) May 07 16:35:45 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages May 07 16:45:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client c1994563-ed97-f834-cb7a-c54fb48168a8 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983ca2f4e400, cur 1557272749 expire 1557272599 last 1557272522 May 07 16:45:49 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 16:53:41 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 675a8ac9-b44f-ac67-d5e8-cc21a647f68f (at 10.9.109.48@o2ib4) May 07 16:53:41 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages May 07 16:57:00 fir-md1-s1 kernel: Lustre: 102359:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff984aa668c200 x1631308619085024/t0(0) o101->deb1d8ed-7793-d518-4967-f9ac969b072e@10.9.112.14@o2ib4:5/0 lens 1768/3288 e 1 to 0 dl 1557273425 ref 2 fl Interpret:/0/0 rc 0/0 May 07 16:57:01 fir-md1-s1 kernel: Lustre: 102551:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff984aa668ec00 x1631547276487216/t0(0) o101->c5d29146-8e69-99bb-85ae-0e928604facc@10.8.0.68@o2ib6:6/0 lens 576/3264 e 1 to 0 dl 1557273426 ref 2 fl Interpret:/0/0 rc 0/0 May 07 16:57:03 fir-md1-s1 kernel: Lustre: 102572:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9829a32cd700 x1631593566271472/t0(0) o101->3ddfc0e1-d9a8-93ac-6e7d-3e2edb9b897f@10.8.0.65@o2ib6:8/0 lens 1768/3288 e 1 to 0 dl 1557273428 ref 2 fl Interpret:/0/0 rc 0/0 May 07 16:57:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client deb1d8ed-7793-d518-4967-f9ac969b072e (at 10.9.112.14@o2ib4) reconnecting May 07 16:57:07 fir-md1-s1 kernel: Lustre: 101908:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982aefb37b00 x1632904470406992/t0(0) o36->f4b2b27d-5164-df59-9352-2a8dbf5b8bcd@10.8.23.14@o2ib6:12/0 lens 488/3152 e 1 to 0 dl 1557273432 ref 2 fl Interpret:/0/0 rc 0/0 May 07 16:57:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 3ddfc0e1-d9a8-93ac-6e7d-3e2edb9b897f (at 10.8.0.65@o2ib6) reconnecting May 07 16:57:09 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 07 16:57:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client f4b2b27d-5164-df59-9352-2a8dbf5b8bcd (at 10.8.23.14@o2ib6) reconnecting May 07 16:57:15 fir-md1-s1 kernel: LustreError: 101689:0:(upcall_cache.c:233:upcall_cache_get_entry()) acquire for key 242836: error -110 May 07 16:57:16 fir-md1-s1 kernel: LustreError: 102525:0:(upcall_cache.c:233:upcall_cache_get_entry()) acquire for key 275059: error -110 May 07 16:57:18 fir-md1-s1 kernel: LustreError: 101750:0:(upcall_cache.c:233:upcall_cache_get_entry()) acquire for key 43046: error -110 May 07 16:57:22 fir-md1-s1 kernel: LustreError: 102676:0:(upcall_cache.c:233:upcall_cache_get_entry()) acquire for key 317524: error -110 May 07 16:57:23 fir-md1-s1 kernel: Lustre: 102357:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff984a2ef5c800 x1631547276582224/t0(0) o101->c5d29146-8e69-99bb-85ae-0e928604facc@10.8.0.68@o2ib6:28/0 lens 592/3264 e 0 to 0 dl 1557273448 ref 2 fl Interpret:/0/0 rc 0/0 May 07 17:05:48 fir-md1-s1 kernel: Lustre: 102427:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557273941/real 1557273941] req@ffff98250bb16900 x1632380429888496/t0(0) o106->fir-MDT0002@10.9.109.43@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1557273948 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 17:05:48 fir-md1-s1 kernel: Lustre: 102427:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 07 17:05:55 fir-md1-s1 kernel: Lustre: 102427:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557273948/real 1557273948] req@ffff98250bb16900 x1632380429888496/t0(0) o106->fir-MDT0002@10.9.109.43@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1557273955 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 17:06:02 fir-md1-s1 kernel: Lustre: 102427:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557273955/real 1557273955] req@ffff98250bb16900 x1632380429888496/t0(0) o106->fir-MDT0002@10.9.109.43@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1557273962 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 17:06:06 fir-md1-s1 kernel: Lustre: 102431:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff982b1f3b6300 x1632106117304496/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:11/0 lens 480/568 e 0 to 0 dl 1557273971 ref 2 fl Interpret:/0/0 rc 0/0 May 07 17:06:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 17:06:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 07 17:06:12 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages May 07 17:06:16 fir-md1-s1 kernel: Lustre: 102427:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557273969/real 1557273969] req@ffff98250bb16900 x1632380429888496/t0(0) o106->fir-MDT0002@10.9.109.43@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1557273976 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 17:06:16 fir-md1-s1 kernel: Lustre: 102427:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 07 17:06:37 fir-md1-s1 kernel: Lustre: 102427:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557273990/real 1557273990] req@ffff98250bb16900 x1632380429888496/t0(0) o106->fir-MDT0002@10.9.109.43@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1557273997 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 17:06:37 fir-md1-s1 kernel: Lustre: 102427:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 07 17:06:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 17:07:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 17:07:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 45f832a0-7d8a-4d46-d891-fd630cd0c7e1 (at 10.9.109.43@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984c07a55800, cur 1557274037 expire 1557273887 last 1557273810 May 07 17:07:17 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages May 07 17:08:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 4b5bfe68-a7e2-f5c6-e273-5c76b638d869 (at 10.8.26.4@o2ib6) in 164 seconds. I think it's dead, and I am evicting it. exp ffff982740e4b000, cur 1557274113 expire 1557273963 last 1557273949 May 07 17:08:33 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 17:10:13 fir-md1-s1 kernel: Lustre: 102440:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557274206/real 1557274206] req@ffff9825d1589200 x1632380505743696/t0(0) o106->fir-MDT0002@10.8.10.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557274213 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 17:10:13 fir-md1-s1 kernel: Lustre: 102440:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 07 17:10:31 fir-md1-s1 kernel: Lustre: 101919:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff9825a85b9800 x1632106121243568/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:6/0 lens 480/568 e 0 to 0 dl 1557274236 ref 2 fl Interpret:/0/0 rc 0/0 May 07 17:10:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 17:11:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client bfd38f5b-ccd8-05cb-0d7c-c4305577818f (at 10.9.109.34@o2ib4) in 193 seconds. I think it's dead, and I am evicting it. exp ffff985ce300f400, cur 1557274265 expire 1557274115 last 1557274072 May 07 17:11:05 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages May 07 17:12:40 fir-md1-s1 kernel: Lustre: 102593:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557274353/real 1557274353] req@ffff981f30553900 x1632380547577008/t0(0) o106->fir-MDT0002@10.9.109.36@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1557274360 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 17:12:40 fir-md1-s1 kernel: Lustre: 102593:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 07 17:12:58 fir-md1-s1 kernel: Lustre: 102708:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff982ccc79a100 x1632106122940736/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:3/0 lens 480/568 e 0 to 0 dl 1557274383 ref 2 fl Interpret:/0/0 rc 0/0 May 07 17:13:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 17:14:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 17:14:10 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 07 17:14:43 fir-md1-s1 kernel: Lustre: 102370:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff9822c71a1200 x1632106123155552/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:18/0 lens 480/568 e 0 to 0 dl 1557274488 ref 2 fl Interpret:/0/0 rc 0/0 May 07 17:15:14 fir-md1-s1 kernel: Lustre: 102504:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557274507/real 1557274507] req@ffff9826e3a91b00 x1632380578040192/t0(0) o106->fir-MDT0002@10.9.109.44@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1557274514 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 17:15:14 fir-md1-s1 kernel: Lustre: 102504:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 20 previous similar messages May 07 17:16:12 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 75721472-b608-1cd6-ab19-19e98893e801 (at 10.9.109.35@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9847a17b4c00, cur 1557274572 expire 1557274422 last 1557274345 May 07 17:16:12 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages May 07 17:16:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 17:16:30 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages May 07 17:16:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 07 17:16:31 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages May 07 17:16:31 fir-md1-s1 kernel: Lustre: 102504:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (131:2s); client may timeout. req@ffff9822c71a1200 x1632106123155552/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:18/0 lens 480/536 e 0 to 0 dl 1557274589 ref 1 fl Complete:/0/0 rc 301/301 May 07 17:18:56 fir-md1-s1 kernel: Lustre: 102657:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff983bd5a5e300 x1631559623938416/t0(0) o101->d5482dec-dd94-00f8-737a-5b6b97429b46@10.9.106.50@o2ib4:1/0 lens 1792/3288 e 1 to 0 dl 1557274741 ref 2 fl Interpret:/0/0 rc 0/0 May 07 17:20:15 fir-md1-s1 kernel: LustreError: 102400:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557274725, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff98362d729440/0xce8853b6be4bbd0f lrc: 3/1,0 mode: --/PR res: [0x2c001c0e4:0xdf7f:0x0].0x0 bits 0x13/0x0 rrc: 15 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 102400 timeout: 0 lvb_type: 0 May 07 17:20:19 fir-md1-s1 kernel: Lustre: 102522:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557274812/real 1557274812] req@ffff9831f8bb5100 x1632380652724432/t0(0) o104->fir-MDT0002@10.9.109.39@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1557274819 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 17:20:19 fir-md1-s1 kernel: Lustre: 102522:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 23 previous similar messages May 07 17:26:32 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 503071be-93fc-d45b-6e4f-1c629c6ebc8f (at 10.9.109.39@o2ib4) May 07 17:26:32 fir-md1-s1 kernel: Lustre: Skipped 82 previous similar messages May 07 17:26:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 3c830103-98b8-4842-3353-440daf52ac65 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98308a69b400, cur 1557275209 expire 1557275059 last 1557274982 May 07 17:26:49 fir-md1-s1 kernel: Lustre: Skipped 80 previous similar messages May 07 17:32:41 fir-md1-s1 kernel: Lustre: 102716:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557275554/real 1557275554] req@ffff982a28e7c500 x1632380887553696/t0(0) o106->fir-MDT0002@10.8.10.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557275561 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 17:32:41 fir-md1-s1 kernel: Lustre: 102716:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages May 07 17:32:49 fir-md1-s1 kernel: Lustre: 101683:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff98223224e600 x1632106147192880/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:24/0 lens 480/568 e 1 to 0 dl 1557275574 ref 2 fl Interpret:/0/0 rc 0/0 May 07 17:32:49 fir-md1-s1 kernel: Lustre: 101683:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages May 07 17:32:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 17:32:56 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages May 07 17:36:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.10.29@o2ib6) May 07 17:36:54 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages May 07 17:36:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e894ed2e-6861-5e94-03be-1d93c9d595c2 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983a6ba00800, cur 1557275815 expire 1557275665 last 1557275588 May 07 17:36:55 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages May 07 17:44:11 fir-md1-s1 kernel: Lustre: 102483:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557276244/real 1557276244] req@ffff982aebcff500 x1632381093311024/t0(0) o106->fir-MDT0002@10.8.10.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557276251 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 17:44:11 fir-md1-s1 kernel: Lustre: 102483:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 7 previous similar messages May 07 17:44:19 fir-md1-s1 kernel: Lustre: 102667:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982a5a721b00 x1632106162915872/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:24/0 lens 480/568 e 1 to 0 dl 1557276264 ref 2 fl Interpret:/0/0 rc 0/0 May 07 17:44:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 17:44:25 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 07 17:44:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 17:45:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 17:45:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 17:49:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.10.29@o2ib6) May 07 17:49:05 fir-md1-s1 kernel: Lustre: Skipped 27 previous similar messages May 07 17:50:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 39f2415e-d6dc-418d-be1f-b6ab1020f192 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9826c9b6d800, cur 1557276648 expire 1557276498 last 1557276421 May 07 17:50:48 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 17:55:54 fir-md1-s1 kernel: Lustre: 102593:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557276947/real 1557276947] req@ffff9821dfca2400 x1632381284026672/t0(0) o106->fir-MDT0002@10.8.10.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557276954 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 17:55:54 fir-md1-s1 kernel: Lustre: 102593:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 11 previous similar messages May 07 17:56:02 fir-md1-s1 kernel: Lustre: 102562:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9820dfaacb00 x1632106178587680/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:7/0 lens 480/568 e 1 to 0 dl 1557276967 ref 2 fl Interpret:/0/0 rc 0/0 May 07 17:56:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 18:00:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.10.29@o2ib6) May 07 18:00:11 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages May 07 18:08:12 fir-md1-s1 kernel: Lustre: 102420:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557277685/real 1557277685] req@ffff9826a9abc800 x1632381479074240/t0(0) o106->fir-MDT0002@10.8.10.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557277692 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 18:08:12 fir-md1-s1 kernel: Lustre: 102446:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557277685/real 1557277685] req@ffff98299aa70300 x1632381479074096/t0(0) o106->fir-MDT0002@10.8.10.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557277692 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 18:08:12 fir-md1-s1 kernel: Lustre: 102446:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 11 previous similar messages May 07 18:08:20 fir-md1-s1 kernel: Lustre: 101750:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982315beb600 x1632106188728176/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:25/0 lens 480/568 e 1 to 0 dl 1557277705 ref 2 fl Interpret:/0/0 rc 0/0 May 07 18:08:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 18:08:26 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages May 07 18:08:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 18:09:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 18:09:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client db97291c-0e58-1db8-997e-93e0079f42e9 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983999771400, cur 1557277767 expire 1557277617 last 1557277540 May 07 18:09:27 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 18:12:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.10.29@o2ib6) May 07 18:12:46 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages May 07 18:17:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bee493fe-0255-e0d8-44b9-deb38a2dee88 (at 10.9.0.61@o2ib4) reconnecting May 07 18:17:09 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 07 18:18:19 fir-md1-s1 kernel: Lustre: 102983:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557278292/real 1557278292] req@ffff985c4fffb000 x1632381614071184/t0(0) o106->fir-MDT0000@10.9.112.13@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1557278299 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 18:18:19 fir-md1-s1 kernel: Lustre: 102983:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 36 previous similar messages May 07 18:18:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bee493fe-0255-e0d8-44b9-deb38a2dee88 (at 10.9.0.61@o2ib4) reconnecting May 07 18:18:33 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages May 07 18:19:16 fir-md1-s1 kernel: Lustre: 102642:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff985c4ffff800 x1632169622286896/t0(0) o101->af26ea58-5b3c-18ce-a05f-14f0d6aed832@10.9.0.63@o2ib4:21/0 lens 480/568 e 1 to 0 dl 1557278361 ref 2 fl Interpret:/0/0 rc 0/0 May 07 18:19:16 fir-md1-s1 kernel: Lustre: 102642:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages May 07 18:20:08 fir-md1-s1 kernel: LNet: Service thread pid 102983 was inactive for 200.27s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: May 07 18:20:08 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message May 07 18:20:08 fir-md1-s1 kernel: Pid: 102983, comm: mdt03_063 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 May 07 18:20:08 fir-md1-s1 kernel: Call Trace: May 07 18:20:08 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x500/0x8d0 [ptlrpc] May 07 18:20:08 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] May 07 18:20:08 fir-md1-s1 kernel: [] ldlm_glimpse_locks+0x3b/0x100 [ptlrpc] May 07 18:20:08 fir-md1-s1 kernel: [] mdt_do_glimpse+0x1e9/0x4c0 [mdt] May 07 18:20:08 fir-md1-s1 kernel: [] mdt_glimpse_enqueue+0x3d3/0x4f0 [mdt] May 07 18:20:08 fir-md1-s1 kernel: [] mdt_intent_glimpse+0x1f/0x30 [mdt] May 07 18:20:08 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] May 07 18:20:08 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] May 07 18:20:08 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] May 07 18:20:08 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] May 07 18:20:08 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] May 07 18:20:08 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 07 18:20:08 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] May 07 18:20:08 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 May 07 18:20:08 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 May 07 18:20:08 fir-md1-s1 kernel: [] 0xffffffffffffffff May 07 18:20:08 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1557278408.102983 May 07 18:20:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d654f0f4-19f6-d029-e9b9-629b432c9609 (at 10.9.112.13@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9829f06e0c00, cur 1557278415 expire 1557278265 last 1557278188 May 07 18:20:15 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 18:20:15 fir-md1-s1 kernel: LNet: Service thread pid 102983 completed after 206.91s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). May 07 18:25:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.15.4@o2ib6) May 07 18:25:10 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages May 07 18:28:21 fir-md1-s1 kernel: LNetError: 101318:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 07 18:40:44 fir-md1-s1 kernel: Lustre: 101919:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557279637/real 1557279637] req@ffff98218ea61500 x1632382003743152/t0(0) o106->fir-MDT0002@10.8.10.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557279644 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 18:40:44 fir-md1-s1 kernel: Lustre: 101919:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 27 previous similar messages May 07 18:40:52 fir-md1-s1 kernel: Lustre: 102479:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff981eea2da700 x1632106205604816/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:27/0 lens 480/568 e 1 to 0 dl 1557279657 ref 2 fl Interpret:/0/0 rc 0/0 May 07 18:40:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 18:40:58 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages May 07 18:40:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 07 18:40:58 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 18:41:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 18:41:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 6c0661ac-98eb-50dd-0f02-14b2a8b09ba5 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983a4e2e7400, cur 1557279702 expire 1557279552 last 1557279475 May 07 18:41:42 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 18:47:14 fir-md1-s1 kernel: Lustre: 102435:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557280027/real 1557280027] req@ffff982048fd2700 x1632382112032240/t0(0) o106->fir-MDT0002@10.8.15.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557280034 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 18:47:14 fir-md1-s1 kernel: Lustre: 102435:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 16 previous similar messages May 07 18:47:22 fir-md1-s1 kernel: Lustre: 102507:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff98397322cb00 x1631560819240592/t0(0) o101->30832ebc-e700-a2ed-766e-a894d22af97d@10.9.113.8@o2ib4:27/0 lens 1784/3288 e 1 to 0 dl 1557280047 ref 2 fl Interpret:/0/0 rc 0/0 May 07 18:47:22 fir-md1-s1 kernel: Lustre: 102507:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 07 18:47:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 30832ebc-e700-a2ed-766e-a894d22af97d (at 10.9.113.8@o2ib4) reconnecting May 07 18:47:28 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 07 18:48:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 18:48:40 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 18:49:41 fir-md1-s1 kernel: LustreError: 102538:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.15.4@o2ib6) failed to reply to blocking AST (req@ffff98320e3b1500 x1632382112105056 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff9831a065e9c0/0xce8853b7307423b7 lrc: 4/0,0 mode: PR/PR res: [0x2c002409c:0xfdd9:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.8.15.4@o2ib6 remote: 0xeb785fd96c6b0d44 expref: 1069 pid: 102676 timeout: 805595 lvb_type: 0 May 07 18:49:41 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.15.4@o2ib6 was evicted due to a lock blocking callback time out: rc -110 May 07 18:49:41 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.15.4@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff9831a065e9c0/0xce8853b7307423b7 lrc: 3/0,0 mode: PR/PR res: [0x2c002409c:0xfdd9:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.8.15.4@o2ib6 remote: 0xeb785fd96c6b0d44 expref: 1070 pid: 102676 timeout: 0 lvb_type: 0 May 07 18:50:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 785a0a0b-c618-c066-907a-762b25db5bf2 (at 10.8.15.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98395cb7ac00, cur 1557280218 expire 1557280068 last 1557279991 May 07 18:50:18 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 19:03:15 fir-md1-s1 kernel: Lustre: 101913:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557280988/real 1557280988] req@ffff982048fd6900 x1632382376522016/t0(0) o106->fir-MDT0000@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557280995 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 19:03:15 fir-md1-s1 kernel: Lustre: 101913:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 206 previous similar messages May 07 19:03:23 fir-md1-s1 kernel: Lustre: 102479:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff98299aa76000 x1632106212984176/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:28/0 lens 480/568 e 1 to 0 dl 1557281008 ref 2 fl Interpret:/0/0 rc 0/0 May 07 19:03:23 fir-md1-s1 kernel: Lustre: 102479:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 24 previous similar messages May 07 19:03:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 19:03:29 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages May 07 19:03:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.0.10.3@o2ib7) May 07 19:03:29 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages May 07 19:03:36 fir-md1-s1 kernel: Lustre: 101913:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557281009/real 1557281009] req@ffff982048fd6900 x1632382376522016/t0(0) o106->fir-MDT0000@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557281016 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 19:03:36 fir-md1-s1 kernel: Lustre: 101913:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 07 19:03:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 19:04:18 fir-md1-s1 kernel: Lustre: 101913:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557281051/real 1557281051] req@ffff982048fd6900 x1632382376522016/t0(0) o106->fir-MDT0000@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557281058 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 19:04:18 fir-md1-s1 kernel: Lustre: 101913:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 07 19:04:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 19:04:32 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 07 19:04:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.0.10.3@o2ib7) May 07 19:04:53 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages May 07 19:05:35 fir-md1-s1 kernel: Lustre: 101913:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557281128/real 1557281128] req@ffff982048fd6900 x1632382376522016/t0(0) o106->fir-MDT0000@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557281135 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 19:05:35 fir-md1-s1 kernel: Lustre: 101913:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 10 previous similar messages May 07 19:05:47 fir-md1-s1 kernel: Lustre: 102551:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9840a6b47b00 x1631544351740912/t0(0) o101->2e1837bb-385a-af64-a5d1-7a58230af8b2@10.9.0.64@o2ib4:22/0 lens 480/568 e 1 to 0 dl 1557281152 ref 2 fl Interpret:/0/0 rc 0/0 May 07 19:05:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 2e1837bb-385a-af64-a5d1-7a58230af8b2 (at 10.9.0.64@o2ib4) reconnecting May 07 19:05:55 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages May 07 19:06:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f4b2b27d-5164-df59-9352-2a8dbf5b8bcd (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9851ac263800, cur 1557281186 expire 1557281036 last 1557280959 May 07 19:06:26 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 07 19:13:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 7bd6652b-ae50-5135-d32c-c0df332b2aef (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984882ea9800, cur 1557281596 expire 1557281446 last 1557281369 May 07 19:13:16 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 19:15:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 9a56a19f-ae1b-ace0-70a7-6d8e93c71c2f (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9840a677d400, cur 1557281752 expire 1557281602 last 1557281525 May 07 19:15:52 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 19:16:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a9f925c3-c6cc-1827-f5ca-23aeee30bd63 (at 10.8.23.14@o2ib6) May 07 19:16:18 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages May 07 19:47:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client c7854b01-d446-4b98-3dd0-a771eb9a48dc (at 10.8.1.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985c7cf54000, cur 1557283672 expire 1557283522 last 1557283445 May 07 19:47:52 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 19:51:38 fir-md1-s1 kernel: Lustre: 102664:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557283891/real 1557283891] req@ffff982048fd3300 x1632383195578192/t0(0) o106->fir-MDT0000@10.8.10.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557283898 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 19:51:38 fir-md1-s1 kernel: Lustre: 102664:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 15 previous similar messages May 07 19:51:46 fir-md1-s1 kernel: Lustre: 102572:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9822b86f5a00 x1632106253829872/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:21/0 lens 480/568 e 1 to 0 dl 1557283911 ref 2 fl Interpret:/0/0 rc 0/0 May 07 19:51:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 19:51:52 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages May 07 19:51:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.0.10.3@o2ib7) May 07 19:51:52 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 19:51:59 fir-md1-s1 kernel: Lustre: 102716:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557283912/real 1557283912] req@ffff982c22e1e300 x1632383195578112/t0(0) o106->fir-MDT0000@10.8.10.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557283919 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 19:51:59 fir-md1-s1 kernel: Lustre: 102716:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages May 07 19:52:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 19:52:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.0.10.3@o2ib7) May 07 19:52:34 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 07 19:52:41 fir-md1-s1 kernel: Lustre: 102664:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557283954/real 1557283954] req@ffff982048fd3300 x1632383195578192/t0(0) o106->fir-MDT0000@10.8.10.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557283961 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 19:52:41 fir-md1-s1 kernel: Lustre: 102664:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 11 previous similar messages May 07 19:52:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 277ff991-6cec-64c6-c68e-c37b2537028b (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983551642400, cur 1557283974 expire 1557283824 last 1557283747 May 07 19:52:54 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 19:57:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.10.29@o2ib6) May 07 20:04:19 fir-md1-s1 kernel: Lustre: 102456:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557284652/real 1557284652] req@ffff981d0f088600 x1632383444749232/t0(0) o106->fir-MDT0002@10.8.10.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557284659 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 20:04:19 fir-md1-s1 kernel: Lustre: 102456:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 07 20:04:27 fir-md1-s1 kernel: Lustre: 102564:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9823e13b5400 x1632106266950032/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:2/0 lens 480/568 e 1 to 0 dl 1557284672 ref 2 fl Interpret:/0/0 rc 0/0 May 07 20:04:27 fir-md1-s1 kernel: Lustre: 102564:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 07 20:04:33 fir-md1-s1 kernel: Lustre: 102456:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557284666/real 1557284666] req@ffff981d0f088600 x1632383444749232/t0(0) o106->fir-MDT0002@10.8.10.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557284673 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 20:04:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 20:04:33 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 07 20:04:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 07 20:04:33 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 20:04:33 fir-md1-s1 kernel: Lustre: 102456:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 07 20:04:54 fir-md1-s1 kernel: Lustre: 102456:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557284687/real 1557284687] req@ffff981d0f088600 x1632383444749232/t0(0) o106->fir-MDT0002@10.8.10.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557284694 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 20:04:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 20:04:54 fir-md1-s1 kernel: Lustre: 102456:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 07 20:05:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 20:05:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client fea369df-1415-a320-0427-042cc66dfb9c (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9822cf1ee400, cur 1557284722 expire 1557284572 last 1557284495 May 07 20:05:22 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 20:09:41 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.10.29@o2ib6) May 07 20:09:41 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 20:16:41 fir-md1-s1 kernel: Lustre: 102651:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557285394/real 1557285394] req@ffff982ce7342400 x1632383688641056/t0(0) o106->fir-MDT0002@10.8.10.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557285401 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 20:16:41 fir-md1-s1 kernel: Lustre: 102651:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages May 07 20:16:48 fir-md1-s1 kernel: Lustre: 102651:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557285401/real 1557285401] req@ffff982ce7342400 x1632383688641056/t0(0) o106->fir-MDT0002@10.8.10.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557285408 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 20:16:49 fir-md1-s1 kernel: Lustre: 102667:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982276373000 x1632106280298448/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:24/0 lens 480/568 e 1 to 0 dl 1557285414 ref 2 fl Interpret:/0/0 rc 0/0 May 07 20:16:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 20:17:02 fir-md1-s1 kernel: Lustre: 102651:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557285415/real 1557285415] req@ffff982ce7342400 x1632383688641056/t0(0) o106->fir-MDT0002@10.8.10.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557285422 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 20:17:02 fir-md1-s1 kernel: Lustre: 102651:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 07 20:17:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 20:17:23 fir-md1-s1 kernel: Lustre: 102651:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557285436/real 1557285436] req@ffff982ce7342400 x1632383688641056/t0(0) o106->fir-MDT0002@10.8.10.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557285443 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 20:17:23 fir-md1-s1 kernel: Lustre: 102651:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 07 20:17:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 20:17:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 2f4ebf14-bc7c-5c22-d321-ba8a7f17a339 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983e87b44c00, cur 1557285462 expire 1557285312 last 1557285235 May 07 20:17:42 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 20:19:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.1.11@o2ib6) May 07 20:19:57 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 20:29:38 fir-md1-s1 kernel: Lustre: 102710:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557286171/real 1557286171] req@ffff9824687ea700 x1632383921399104/t0(0) o106->fir-MDT0002@10.8.10.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557286178 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 20:29:38 fir-md1-s1 kernel: Lustre: 102710:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 07 20:29:45 fir-md1-s1 kernel: Lustre: 102710:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557286178/real 1557286178] req@ffff9824687ea700 x1632383921399104/t0(0) o106->fir-MDT0002@10.8.10.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557286185 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 20:29:46 fir-md1-s1 kernel: Lustre: 102739:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982bad245a00 x1632106293641872/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:21/0 lens 480/568 e 1 to 0 dl 1557286191 ref 2 fl Interpret:/0/0 rc 0/0 May 07 20:29:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 20:29:59 fir-md1-s1 kernel: Lustre: 102710:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557286192/real 1557286192] req@ffff9824687ea700 x1632383921399104/t0(0) o106->fir-MDT0002@10.8.10.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557286199 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 20:29:59 fir-md1-s1 kernel: Lustre: 102710:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 07 20:30:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 20:30:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 07 20:30:13 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages May 07 20:30:20 fir-md1-s1 kernel: Lustre: 102710:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557286213/real 1557286213] req@ffff9824687ea700 x1632383921399104/t0(0) o106->fir-MDT0002@10.8.10.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557286220 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 20:30:20 fir-md1-s1 kernel: Lustre: 102710:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 07 20:30:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 20:30:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 20:31:02 fir-md1-s1 kernel: Lustre: 102710:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557286255/real 1557286255] req@ffff9824687ea700 x1632383921399104/t0(0) o106->fir-MDT0002@10.8.10.29@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557286262 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 20:31:02 fir-md1-s1 kernel: Lustre: 102710:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 07 20:31:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client c29839db-474d-e164-4d2b-80ceb9a5f39c (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98382e263000, cur 1557286266 expire 1557286116 last 1557286039 May 07 20:31:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 20:31:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client c29839db-474d-e164-4d2b-80ceb9a5f39c (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982816ae0800, cur 1557286268 expire 1557286118 last 1557286041 May 07 20:45:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client c63fa75e-db43-02cb-1e97-c490767a8a3c (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9837377f6c00, cur 1557287145 expire 1557286995 last 1557286918 May 07 20:45:45 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 07 20:49:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.10.29@o2ib6) May 07 20:49:07 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 07 21:30:28 fir-md1-s1 kernel: Lustre: 102570:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 May 07 22:15:46 fir-md1-s1 kernel: Lustre: 101913:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557292539/real 1557292539] req@ffff9820d0771200 x1632385803922608/t0(0) o106->fir-MDT0002@10.8.25.5@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557292546 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 22:15:55 fir-md1-s1 kernel: Lustre: 102651:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff981d1923f500 x1632106363321872/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:29/0 lens 480/568 e 1 to 0 dl 1557292559 ref 2 fl Interpret:/0/0 rc 0/0 May 07 22:16:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 22:16:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 07 22:16:00 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 22:16:01 fir-md1-s1 kernel: Lustre: 101913:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557292553/real 1557292553] req@ffff9820d0771200 x1632385803922608/t0(0) o106->fir-MDT0002@10.8.25.5@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557292560 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 22:16:01 fir-md1-s1 kernel: Lustre: 101913:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 07 22:16:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 22:16:22 fir-md1-s1 kernel: Lustre: 101913:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557292575/real 1557292575] req@ffff9820d0771200 x1632385803922608/t0(0) o106->fir-MDT0002@10.8.25.5@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557292582 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 22:16:22 fir-md1-s1 kernel: Lustre: 101913:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 07 22:16:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 22:17:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 22:17:04 fir-md1-s1 kernel: Lustre: 101913:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557292617/real 1557292617] req@ffff9820d0771200 x1632385803922608/t0(0) o106->fir-MDT0002@10.8.25.5@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557292624 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 22:17:04 fir-md1-s1 kernel: Lustre: 101913:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 07 22:17:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 22:17:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 07 22:17:24 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages May 07 22:17:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 22:18:21 fir-md1-s1 kernel: Lustre: 101913:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557292694/real 1557292694] req@ffff9820d0771200 x1632385803922608/t0(0) o106->fir-MDT0002@10.8.25.5@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557292701 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 22:18:21 fir-md1-s1 kernel: Lustre: 101913:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 10 previous similar messages May 07 22:18:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 07 22:18:29 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 07 22:18:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 209fb82a-ea02-ddd0-64cc-71f136a8206e (at 10.8.25.5@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984c3d876800, cur 1557292731 expire 1557292581 last 1557292504 May 07 22:18:51 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 22:19:00 fir-md1-s1 kernel: LNet: Service thread pid 101913 was inactive for 200.34s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: May 07 22:19:00 fir-md1-s1 kernel: Pid: 101913, comm: mdt00_009 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 May 07 22:19:00 fir-md1-s1 kernel: Call Trace: May 07 22:19:00 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x500/0x8d0 [ptlrpc] May 07 22:19:00 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] May 07 22:19:00 fir-md1-s1 kernel: [] ldlm_glimpse_locks+0x3b/0x100 [ptlrpc] May 07 22:19:00 fir-md1-s1 kernel: [] mdt_do_glimpse+0x1e9/0x4c0 [mdt] May 07 22:19:00 fir-md1-s1 kernel: [] mdt_glimpse_enqueue+0x3d3/0x4f0 [mdt] May 07 22:19:00 fir-md1-s1 kernel: [] mdt_intent_glimpse+0x1f/0x30 [mdt] May 07 22:19:00 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] May 07 22:19:00 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] May 07 22:19:00 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] May 07 22:19:00 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] May 07 22:19:00 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] May 07 22:19:00 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 07 22:19:00 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] May 07 22:19:00 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 May 07 22:19:00 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 May 07 22:19:00 fir-md1-s1 kernel: [] 0xffffffffffffffff May 07 22:19:00 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1557292740.101913 May 07 22:19:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 209fb82a-ea02-ddd0-64cc-71f136a8206e (at 10.8.25.5@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985c90ae5800, cur 1557292742 expire 1557292592 last 1557292515 May 07 22:19:02 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 07 22:19:02 fir-md1-s1 kernel: LNet: Service thread pid 101913 completed after 202.07s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). May 07 22:31:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 689c4a40-670a-db67-b69e-ba61a332674f (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982c943b1800, cur 1557293473 expire 1557293323 last 1557293246 May 07 22:35:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.10.29@o2ib6) May 07 22:35:21 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages May 07 22:43:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 740b1e7f-c950-a6f4-1c52-52725a57adef (at 10.9.102.1@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982cfbe9a400, cur 1557294182 expire 1557294032 last 1557293955 May 07 22:43:02 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 22:46:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.25.5@o2ib6) May 07 22:46:27 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 07 23:10:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.102.1@o2ib4) May 07 23:10:46 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 08 01:33:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client c6c69886-c1e4-472e-425d-70c33f0565ee (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983c3869f000, cur 1557304411 expire 1557304261 last 1557304184 May 08 01:33:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 08 01:34:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a9f925c3-c6cc-1827-f5ca-23aeee30bd63 (at 10.8.23.14@o2ib6) May 08 01:34:05 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 08 01:39:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 7dff8a6e-e3f6-696d-9545-c3ce3c471f1b (at 10.9.101.1@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9859f2207400, cur 1557304755 expire 1557304605 last 1557304528 May 08 01:39:15 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 08 01:39:29 fir-md1-s1 kernel: Lustre: 102562:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557304762/real 1557304762] req@ffff982bc7d77800 x1632389375537200/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557304769 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 08 01:39:29 fir-md1-s1 kernel: Lustre: 102562:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 6 previous similar messages May 08 01:39:37 fir-md1-s1 kernel: Lustre: 102710:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff98241a9cce00 x1632106502282016/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:12/0 lens 480/568 e 1 to 0 dl 1557304782 ref 2 fl Interpret:/0/0 rc 0/0 May 08 01:39:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 08 01:39:43 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 08 01:39:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 08 01:39:43 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 08 01:39:50 fir-md1-s1 kernel: Lustre: 102456:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557304783/real 1557304783] req@ffff9821dace4500 x1632389375537104/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557304790 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 08 01:39:50 fir-md1-s1 kernel: Lustre: 102456:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages May 08 01:40:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 08 01:40:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 08 01:40:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 08 01:40:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 08 01:40:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 1af90e45-c644-3e4c-21cf-a7be75ad0893 (at 10.8.23.14@o2ib6) in 182 seconds. I think it's dead, and I am evicting it. exp ffff982292318400, cur 1557304831 expire 1557304681 last 1557304649 May 08 01:40:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 08 01:41:16 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client e34f2f9c-5bd9-8f51-7ce3-52d92b654bad (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984c89751400, cur 1557304876 expire 1557304726 last 1557304649 May 08 01:41:16 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 08 01:41:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a9f925c3-c6cc-1827-f5ca-23aeee30bd63 (at 10.8.23.14@o2ib6) May 08 01:51:45 fir-md1-s1 kernel: Lustre: 102479:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982a895f0600 x1632106508052448/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:20/0 lens 480/568 e 1 to 0 dl 1557305510 ref 2 fl Interpret:/0/0 rc 0/0 May 08 01:51:45 fir-md1-s1 kernel: Lustre: 102479:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 08 01:51:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 08 01:51:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 08 01:51:51 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 08 01:52:01 fir-md1-s1 kernel: Lustre: 102451:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557305490/real 1557305490] req@ffff9829fbf2a100 x1632389576857520/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557305521 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 08 01:52:01 fir-md1-s1 kernel: Lustre: 102451:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 11 previous similar messages May 08 01:52:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 08 01:52:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 08 01:52:32 fir-md1-s1 kernel: Lustre: 102388:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557305521/real 1557305521] req@ffff982a30b05400 x1632389576857632/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557305552 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 08 01:52:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 08 01:52:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 08 01:52:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 08 01:52:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 08 01:53:03 fir-md1-s1 kernel: Lustre: 102388:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557305552/real 1557305552] req@ffff982a30b05400 x1632389576857632/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557305583 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 08 01:53:03 fir-md1-s1 kernel: Lustre: 102388:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 08 01:53:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 08 01:53:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d29c4427-36f7-2607-18d7-9a97ca711ca0 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9830d072a000, cur 1557305610 expire 1557305460 last 1557305383 May 08 01:53:34 fir-md1-s1 kernel: Lustre: 102451:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557305583/real 1557305583] req@ffff9829fbf2a100 x1632389576857520/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557305614 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 08 01:53:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 08 01:53:36 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 08 01:53:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 08 01:53:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client d29c4427-36f7-2607-18d7-9a97ca711ca0 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983ce3a86c00, cur 1557305626 expire 1557305476 last 1557305399 May 08 01:53:46 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 08 02:05:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 0eb998ea-51bd-a0fa-2f8b-6090ae7a33f4 (at 10.9.101.23@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982cf531f400, cur 1557306344 expire 1557306194 last 1557306117 May 08 02:06:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.101.1@o2ib4) May 08 02:06:09 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages May 08 02:06:12 fir-md1-s1 kernel: Lustre: 101902:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557306365/real 1557306365] req@ffff982299d92a00 x1632389829546608/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557306372 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 08 02:06:12 fir-md1-s1 kernel: Lustre: 101902:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 08 02:06:19 fir-md1-s1 kernel: Lustre: 102716:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557306372/real 1557306372] req@ffff982b74f1c200 x1632389829546496/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557306379 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 08 02:06:20 fir-md1-s1 kernel: Lustre: 102440:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982238a8e000 x1632106513735584/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:25/0 lens 480/568 e 1 to 0 dl 1557306385 ref 2 fl Interpret:/0/0 rc 0/0 May 08 02:06:20 fir-md1-s1 kernel: Lustre: 102440:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 08 02:06:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 08 02:06:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 08 02:06:26 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 08 02:06:33 fir-md1-s1 kernel: Lustre: 102716:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557306386/real 1557306386] req@ffff982b74f1c200 x1632389829546496/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557306393 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 08 02:06:33 fir-md1-s1 kernel: Lustre: 102716:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages May 08 02:06:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a9f925c3-c6cc-1827-f5ca-23aeee30bd63 (at 10.8.23.14@o2ib6) May 08 02:06:40 fir-md1-s1 kernel: LustreError: 101902:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.23.14@o2ib6) returned error from glimpse AST (req@ffff982299d92a00 x1632389829546608 status -107 rc -107), evict it ns: mdt-fir-MDT0002_UUID lock: ffff9822032ae300/0xce8853b93dc3d266 lrc: 4/0,0 mode: PW/PW res: [0x2c0024164:0x4:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x40200000000000 nid: 10.8.23.14@o2ib6 remote: 0x1f88275d89ddc61e expref: 41 pid: 102381 timeout: 0 lvb_type: 0 May 08 02:06:40 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.23.14@o2ib6 was evicted due to a lock glimpse callback time out: rc -107 May 08 02:06:40 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 336s: evicting client at 10.8.23.14@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff983202e18480/0xce8853b93dc39454 lrc: 4/0,0 mode: PW/PW res: [0x2c0024164:0x3:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x40200000000000 nid: 10.8.23.14@o2ib6 remote: 0x1f88275d89ddc5e6 expref: 42 pid: 102371 timeout: 0 lvb_type: 0 May 08 02:06:40 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message May 08 02:07:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 150c717d-fe27-1f82-6705-4c7280878b47 (at 10.8.23.14@o2ib6) in 203 seconds. I think it's dead, and I am evicting it. exp ffff984c177d4400, cur 1557306420 expire 1557306270 last 1557306217 May 08 02:07:00 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages May 08 02:23:27 fir-md1-s1 kernel: Lustre: 102381:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557307400/real 1557307400] req@ffff9820362c2400 x1632390130651200/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557307407 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 08 02:23:27 fir-md1-s1 kernel: Lustre: 102381:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages May 08 02:23:34 fir-md1-s1 kernel: Lustre: 102651:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557307407/real 1557307407] req@ffff982a879abf00 x1632390130652272/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557307414 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 08 02:23:35 fir-md1-s1 kernel: Lustre: 102672:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9821f9b66900 x1632106521255152/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:10/0 lens 480/568 e 1 to 0 dl 1557307420 ref 2 fl Interpret:/0/0 rc 0/0 May 08 02:23:35 fir-md1-s1 kernel: Lustre: 102672:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 08 02:23:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 08 02:23:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 08 02:23:41 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 08 02:23:41 fir-md1-s1 kernel: Lustre: 102651:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557307414/real 1557307414] req@ffff982a879abf00 x1632390130652272/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557307421 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 08 02:23:41 fir-md1-s1 kernel: Lustre: 102651:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 08 02:23:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a9f925c3-c6cc-1827-f5ca-23aeee30bd63 (at 10.8.23.14@o2ib6) May 08 02:23:48 fir-md1-s1 kernel: LustreError: 102651:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.23.14@o2ib6) returned error from glimpse AST (req@ffff982a879abf00 x1632390130652272 status -107 rc -107), evict it ns: mdt-fir-MDT0002_UUID lock: ffff98318a384140/0xce8853b9517b7b4d lrc: 4/0,0 mode: PW/PW res: [0x2c0024165:0x4:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x40200000000000 nid: 10.8.23.14@o2ib6 remote: 0x2985d0606f601a55 expref: 41 pid: 102605 timeout: 0 lvb_type: 0 May 08 02:23:48 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.23.14@o2ib6 was evicted due to a lock glimpse callback time out: rc -107 May 08 02:23:48 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 328s: evicting client at 10.8.23.14@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff982252dba880/0xce8853b9517b59b9 lrc: 4/0,0 mode: PW/PW res: [0x2c0024165:0x3:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x40200000000000 nid: 10.8.23.14@o2ib6 remote: 0x2985d0606f601a1d expref: 42 pid: 102584 timeout: 0 lvb_type: 0 May 08 02:23:48 fir-md1-s1 kernel: LustreError: 102651:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) Skipped 2 previous similar messages May 08 02:24:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 9d757871-4b65-c1c5-b90b-b7792929cc38 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983c3a271000, cur 1557307480 expire 1557307330 last 1557307253 May 08 02:24:40 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 08 02:28:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client bde0178d-16e8-13c4-36ce-9a7655cac98f (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984cb15be800, cur 1557307705 expire 1557307555 last 1557307478 May 08 02:28:25 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 08 02:28:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a9f925c3-c6cc-1827-f5ca-23aeee30bd63 (at 10.8.23.14@o2ib6) May 08 02:28:30 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 08 02:28:53 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.101.28@o2ib4) May 08 02:28:53 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 08 02:33:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 9ed506bc-1628-3beb-cb4a-acb203d71c54 (at 10.9.101.9@o2ib4) May 08 02:33:21 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 08 02:34:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client eed2a5c8-1fbf-9380-11c4-6cc54c07792d (at 10.8.1.18@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985953785800, cur 1557308065 expire 1557307915 last 1557307838 May 08 02:34:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 08 02:36:34 fir-md1-s1 kernel: Lustre: 102731:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557308187/real 1557308187] req@ffff9823a43c8000 x1632390354781232/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557308194 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 08 02:36:34 fir-md1-s1 kernel: Lustre: 101902:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557308187/real 1557308187] req@ffff982be2baf500 x1632390354781328/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557308194 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 08 02:36:34 fir-md1-s1 kernel: Lustre: 101902:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages May 08 02:36:34 fir-md1-s1 kernel: Lustre: 102731:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 08 02:36:41 fir-md1-s1 kernel: Lustre: 101902:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557308194/real 1557308194] req@ffff982be2baf500 x1632390354781328/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557308201 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 08 02:36:41 fir-md1-s1 kernel: Lustre: 101902:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 08 02:36:42 fir-md1-s1 kernel: Lustre: 101750:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982760fe2100 x1632106527589584/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:17/0 lens 480/568 e 1 to 0 dl 1557308207 ref 2 fl Interpret:/0/0 rc 0/0 May 08 02:36:42 fir-md1-s1 kernel: Lustre: 101750:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 08 02:36:48 fir-md1-s1 kernel: Lustre: 102565:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557308201/real 1557308201] req@ffff982be2ba8300 x1632390354783040/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557308208 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 08 02:36:48 fir-md1-s1 kernel: Lustre: 102565:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 08 02:36:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 08 02:36:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 08 02:36:48 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 08 02:36:55 fir-md1-s1 kernel: Lustre: 101902:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557308208/real 1557308208] req@ffff982be2baf500 x1632390354781328/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557308215 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 08 02:36:55 fir-md1-s1 kernel: Lustre: 101902:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 08 02:37:02 fir-md1-s1 kernel: Lustre: 102565:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557308215/real 1557308215] req@ffff982be2ba8300 x1632390354783040/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557308222 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 08 02:37:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 08 02:37:16 fir-md1-s1 kernel: Lustre: 102565:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557308229/real 1557308229] req@ffff982be2ba8300 x1632390354783040/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557308236 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 08 02:37:16 fir-md1-s1 kernel: Lustre: 102565:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 7 previous similar messages May 08 02:37:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 08 02:37:37 fir-md1-s1 kernel: Lustre: 102731:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557308250/real 1557308250] req@ffff9823a43c8000 x1632390354781232/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557308257 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 08 02:37:37 fir-md1-s1 kernel: Lustre: 102731:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 7 previous similar messages May 08 02:37:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 80425c46-79e7-b0ae-c3b8-a93f00e15555 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9832ad275000, cur 1557308266 expire 1557308116 last 1557308039 May 08 02:37:46 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages May 08 02:38:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a9f925c3-c6cc-1827-f5ca-23aeee30bd63 (at 10.8.23.14@o2ib6) May 08 02:38:00 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 08 03:01:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 243cfd94-2dc6-778a-658a-af871c275931 (at 10.8.1.6@o2ib6) May 08 03:01:47 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 08 03:02:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.1.14@o2ib6) May 08 03:02:03 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages May 08 03:08:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.1.13@o2ib6) May 08 03:08:30 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 08 03:46:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b23f379f-72a3-1922-6d02-f29474e14fa4 (at 10.9.108.49@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982ca491d000, cur 1557312383 expire 1557312233 last 1557312156 May 08 03:46:23 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 08 04:16:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.108.49@o2ib4) May 08 04:16:42 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 08 04:18:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a9f925c3-c6cc-1827-f5ca-23aeee30bd63 (at 10.8.23.14@o2ib6) May 08 04:18:08 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 08 04:18:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 10ab4e14-a395-92da-8319-8f900ac08944 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982746f26c00, cur 1557314331 expire 1557314181 last 1557314104 May 08 04:18:51 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 08 04:42:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 4f066b5e-4fb2-edc0-59a3-4bb970d01f67 (at 10.9.114.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984c72266c00, cur 1557315753 expire 1557315603 last 1557315526 May 08 04:42:33 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 08 04:42:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 4f066b5e-4fb2-edc0-59a3-4bb970d01f67 (at 10.9.114.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98519deaf800, cur 1557315763 expire 1557315613 last 1557315536 May 08 04:42:43 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 08 04:46:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.114.5@o2ib4) May 08 04:46:15 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 08 05:05:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client ce0f7007-d505-ed89-0b36-6f859b7c8ef2 (at 10.9.114.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9832ba33a400, cur 1557317159 expire 1557317009 last 1557316932 May 08 05:09:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.114.5@o2ib4) May 08 05:09:28 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 08 07:11:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 195c8595-9c38-b754-9d29-9f4dc81cde91 (at 10.9.101.35@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9858f1acb800, cur 1557324687 expire 1557324537 last 1557324460 May 08 07:11:27 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 08 07:20:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 4733b776-dee5-b68f-afc7-95bff939ffcb (at 10.9.102.70@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9846712ff800, cur 1557325234 expire 1557325084 last 1557325007 May 08 07:20:34 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 08 07:20:48 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 2373f632-b729-137f-73b9-b700a163eaf9 (at 10.9.102.70@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982bc467d000, cur 1557325248 expire 1557325098 last 1557325021 May 08 07:39:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 7d4b41b7-8bbf-f5cf-3e4e-9ba2be5016c7 (at 10.9.101.35@o2ib4) May 08 07:39:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 08 07:45:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.102.70@o2ib4) May 08 07:45:35 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 08 08:01:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 596fa154-8fdb-18a8-fa7d-8129544c0d55 (at 10.8.14.2@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9822cfa37400, cur 1557327679 expire 1557327529 last 1557327452 May 08 08:01:19 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 08 08:21:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 94ff073b-78cb-a7d4-d32c-307d83b6df33 (at 10.8.23.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984700636800, cur 1557328895 expire 1557328745 last 1557328668 May 08 08:21:35 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 08 08:21:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 94ff073b-78cb-a7d4-d32c-307d83b6df33 (at 10.8.23.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985ce3ca7000, cur 1557328917 expire 1557328767 last 1557328690 May 08 08:21:57 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 08 08:23:46 fir-md1-s1 kernel: Lustre: 102599:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557329015/real 1557329015] req@ffff981f6a793000 x1632396189910240/t0(0) o104->fir-MDT0000@10.9.114.5@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1557329026 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 08 08:23:46 fir-md1-s1 kernel: Lustre: 102599:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages May 08 08:23:50 fir-md1-s1 kernel: Lustre: 102614:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff98321c2d4e00 x1631739527491488/t0(0) o101->6657e991-5ff9-aba5-326a-9311d655d450@10.8.30.24@o2ib6:25/0 lens 576/3264 e 1 to 0 dl 1557329035 ref 2 fl Interpret:/0/0 rc 0/0 May 08 08:23:50 fir-md1-s1 kernel: Lustre: 102614:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages May 08 08:23:51 fir-md1-s1 kernel: Lustre: 102614:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff983ba612f800 x1631740374998528/t0(0) o101->d14f063c-396a-5933-6628-c462c1fc2852@10.8.21.5@o2ib6:26/0 lens 576/3264 e 1 to 0 dl 1557329036 ref 2 fl Interpret:/0/0 rc 0/0 May 08 08:23:51 fir-md1-s1 kernel: Lustre: 102614:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 176 previous similar messages May 08 08:23:52 fir-md1-s1 kernel: Lustre: 102614:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff983bf8ea7800 x1632921080662496/t0(0) o101->e931b561-45a6-d4d8-a002-63ccb3beee09@10.9.109.30@o2ib4:27/0 lens 568/0 e 1 to 0 dl 1557329037 ref 2 fl New:/0/ffffffff rc 0/-1 May 08 08:23:52 fir-md1-s1 kernel: Lustre: 102614:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 187 previous similar messages May 08 08:23:54 fir-md1-s1 kernel: Lustre: 102592:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff98371b734800 x1631551067861392/t0(0) o101->d306ff79-fb9e-2f98-a900-35120cbd847f@10.9.101.7@o2ib4:29/0 lens 576/0 e 1 to 0 dl 1557329039 ref 2 fl New:/0/ffffffff rc 0/-1 May 08 08:23:54 fir-md1-s1 kernel: Lustre: 102592:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 29 previous similar messages May 08 08:23:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 74b14fbc-b3cc-e771-44b0-23517ef5c46c (at 10.9.0.1@o2ib4) reconnecting May 08 08:23:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 7f7fa514-be7d-b6c7-4718-4b4427ec7cee (at 10.8.30.25@o2ib6) May 08 08:23:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 08 08:23:56 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages May 08 08:23:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 446d2031-5a06-05f4-6e28-6d5149ce1c88 (at 10.9.104.39@o2ib4) reconnecting May 08 08:23:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.9.108.71@o2ib4) May 08 08:23:56 fir-md1-s1 kernel: Lustre: Skipped 42 previous similar messages May 08 08:23:56 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages May 08 08:23:57 fir-md1-s1 kernel: Lustre: 102599:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557329026/real 1557329026] req@ffff981f6a793000 x1632396189910240/t0(0) o104->fir-MDT0000@10.9.114.5@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1557329037 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 08 08:23:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 2027e649-8bcd-4ca1-6dcb-dd11dcd45e21 (at 10.9.101.17@o2ib4) reconnecting May 08 08:23:57 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages May 08 08:23:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to e1bff743-7e8a-4b1f-5760-71115d7351a2 (at 10.9.101.17@o2ib4) May 08 08:23:57 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages May 08 08:23:58 fir-md1-s1 kernel: Lustre: 102592:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9832117d0900 x1631535865080240/t0(0) o101->6ea10cfc-48e7-f6e7-b834-4eb6674e3061@10.9.102.48@o2ib4:3/0 lens 576/0 e 1 to 0 dl 1557329043 ref 2 fl New:/0/ffffffff rc 0/-1 May 08 08:23:58 fir-md1-s1 kernel: Lustre: 102592:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 37 previous similar messages May 08 08:23:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 2faef2d8-dc67-f384-07b6-111f344194c1 (at 10.9.101.65@o2ib4) reconnecting May 08 08:23:59 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages May 08 08:23:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 034dbae0-fbe3-61a3-de53-979fb1d61338 (at 10.9.101.65@o2ib4) May 08 08:24:00 fir-md1-s1 kernel: Lustre: Skipped 22 previous similar messages May 08 08:24:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client b4d83e54-8cb6-ea71-956c-e7a98e667a27 (at 10.8.27.26@o2ib6) reconnecting May 08 08:24:04 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages May 08 08:24:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.8.27.26@o2ib6) May 08 08:24:04 fir-md1-s1 kernel: Lustre: Skipped 21 previous similar messages May 08 08:24:06 fir-md1-s1 kernel: Lustre: 102614:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9838fc763c00 x1631643565684544/t0(0) o101->c8b77898-6bcc-54d7-a771-0cfafa351f86@10.9.101.6@o2ib4:11/0 lens 576/0 e 1 to 0 dl 1557329051 ref 2 fl New:/0/ffffffff rc 0/-1 May 08 08:24:06 fir-md1-s1 kernel: Lustre: 102614:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 159 previous similar messages May 08 08:24:08 fir-md1-s1 kernel: Lustre: 102599:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557329037/real 1557329037] req@ffff981f6a793000 x1632396189910240/t0(0) o104->fir-MDT0000@10.9.114.5@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1557329048 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 08 08:24:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client da74a6ae-9e7c-db01-39f9-c8d7b66544b1 (at 10.9.101.19@o2ib4) reconnecting May 08 08:24:12 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages May 08 08:24:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.9.101.19@o2ib4) May 08 08:24:12 fir-md1-s1 kernel: Lustre: Skipped 71 previous similar messages May 08 08:24:22 fir-md1-s1 kernel: Lustre: 102614:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff983ae1e90f00 x1632868175145088/t0(0) o101->a90830c2-0409-137c-0c8c-2926248a6e22@10.8.1.29@o2ib6:27/0 lens 576/0 e 1 to 0 dl 1557329067 ref 2 fl New:/0/ffffffff rc 0/-1 May 08 08:24:22 fir-md1-s1 kernel: Lustre: 102614:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 152 previous similar messages May 08 08:24:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client a90830c2-0409-137c-0c8c-2926248a6e22 (at 10.8.1.29@o2ib6) reconnecting May 08 08:24:28 fir-md1-s1 kernel: Lustre: Skipped 197 previous similar messages May 08 08:24:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.8.1.29@o2ib6) May 08 08:24:28 fir-md1-s1 kernel: Lustre: Skipped 197 previous similar messages May 08 08:24:30 fir-md1-s1 kernel: Lustre: 102599:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557329059/real 1557329059] req@ffff981f6a793000 x1632396189910240/t0(0) o104->fir-MDT0000@10.9.114.5@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1557329070 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 08 08:24:30 fir-md1-s1 kernel: Lustre: 102599:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 08 08:24:54 fir-md1-s1 kernel: Lustre: 102614:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982e14711500 x1631819853702704/t0(0) o101->1b68b593-b8e6-1e8f-59f0-b5cdd3c0c8e8@10.8.20.33@o2ib6:29/0 lens 576/0 e 0 to 0 dl 1557329099 ref 2 fl New:/2/ffffffff rc 0/-1 May 08 08:24:54 fir-md1-s1 kernel: Lustre: 102614:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 390 previous similar messages May 08 08:25:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client bfd877a1-6be6-6bca-d52b-a9ea94889bcb (at 10.9.108.25@o2ib4) reconnecting May 08 08:25:00 fir-md1-s1 kernel: Lustre: Skipped 420 previous similar messages May 08 08:25:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to f5f92765-342a-418f-7600-2b9a7cbc8f11 (at 10.9.108.25@o2ib4) May 08 08:25:00 fir-md1-s1 kernel: Lustre: Skipped 420 previous similar messages May 08 08:25:03 fir-md1-s1 kernel: Lustre: 102599:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557329092/real 1557329092] req@ffff981f6a793000 x1632396189910240/t0(0) o104->fir-MDT0000@10.9.114.5@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1557329103 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 08 08:25:03 fir-md1-s1 kernel: Lustre: 102599:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 08 08:25:05 fir-md1-s1 kernel: LustreError: 102598:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557329015, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff9843f9eb5100/0xce8853bb1b01d7d1 lrc: 3/1,0 mode: --/PR res: [0x200000400:0x5:0x0].0x0 bits 0x13/0x0 rrc: 853 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 102598 timeout: 0 lvb_type: 0 May 08 08:25:05 fir-md1-s1 kernel: LustreError: 102598:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 117 previous similar messages May 08 08:25:05 fir-md1-s1 kernel: LustreError: 102716:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557329015, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff982784a4cc80/0xce8853bb1b04c334 lrc: 3/1,0 mode: --/PR res: [0x200000400:0x5:0x0].0x0 bits 0x13/0x0 rrc: 855 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 102716 timeout: 0 lvb_type: 0 May 08 08:25:05 fir-md1-s1 kernel: LustreError: 102716:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 30 previous similar messages May 08 08:25:06 fir-md1-s1 kernel: LustreError: 102603:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557329016, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff9849e1ab6300/0xce8853bb1b07e8ea lrc: 3/1,0 mode: --/PR res: [0x200000400:0x5:0x0].0x0 bits 0x13/0x0 rrc: 855 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 102603 timeout: 0 lvb_type: 0 May 08 08:25:06 fir-md1-s1 kernel: LustreError: 102603:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 110 previous similar messages May 08 08:25:09 fir-md1-s1 kernel: LustreError: 101911:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557329019, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff9826cffd3180/0xce8853bb1b07fdc0 lrc: 3/1,0 mode: --/PR res: [0x200000400:0x5:0x0].0x0 bits 0x13/0x0 rrc: 861 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 101911 timeout: 0 lvb_type: 0 May 08 08:25:09 fir-md1-s1 kernel: LustreError: 101911:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 17 previous similar messages May 08 08:25:14 fir-md1-s1 kernel: LustreError: 102609:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557329023, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff984cb7e65580/0xce8853bb1b080a62 lrc: 3/1,0 mode: --/PR res: [0x200000400:0x5:0x0].0x0 bits 0x13/0x0 rrc: 863 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 102609 timeout: 0 lvb_type: 0 May 08 08:25:14 fir-md1-s1 kernel: LustreError: 102609:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 21 previous similar messages May 08 08:25:22 fir-md1-s1 kernel: LustreError: 102642:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557329032, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff983bc8634380/0xce8853bb1b080e60 lrc: 3/1,0 mode: --/PR res: [0x200000400:0x5:0x0].0x0 bits 0x13/0x0 rrc: 867 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 102642 timeout: 0 lvb_type: 0 May 08 08:25:22 fir-md1-s1 kernel: LustreError: 102642:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 15 previous similar messages May 08 08:25:38 fir-md1-s1 kernel: LustreError: 102684:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557329048, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff9835d078a880/0xce8853bb1b08169b lrc: 3/1,0 mode: --/PR res: [0x200000400:0x5:0x0].0x0 bits 0x13/0x0 rrc: 881 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 102684 timeout: 0 lvb_type: 0 May 08 08:25:38 fir-md1-s1 kernel: LustreError: 102684:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 15 previous similar messages May 08 08:25:58 fir-md1-s1 kernel: Lustre: 102614:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff9835e8a4da00 x1631305233625584/t0(0) o101->ac30e7a3-0c2a-d332-2695-2869a8e06f55@10.8.29.7@o2ib6:3/0 lens 1768/0 e 0 to 0 dl 1557329163 ref 2 fl New:/2/ffffffff rc 0/-1 May 08 08:25:58 fir-md1-s1 kernel: Lustre: 102614:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 922 previous similar messages May 08 08:26:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 92186514-3447-4842-6088-bc954b31a07a (at 10.9.101.49@o2ib4) reconnecting May 08 08:26:04 fir-md1-s1 kernel: Lustre: Skipped 1066 previous similar messages May 08 08:26:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.9.101.49@o2ib4) May 08 08:26:04 fir-md1-s1 kernel: Lustre: Skipped 1067 previous similar messages May 08 08:26:09 fir-md1-s1 kernel: Lustre: 102599:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557329158/real 1557329158] req@ffff981f6a793000 x1632396189910240/t0(0) o104->fir-MDT0000@10.9.114.5@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1557329169 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 08 08:26:09 fir-md1-s1 kernel: Lustre: 102599:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 08 08:26:09 fir-md1-s1 kernel: LustreError: 102599:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.9.114.5@o2ib4) failed to reply to blocking AST (req@ffff981f6a793000 x1632396189910240 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff98356ba67980/0xce8853bae0ff6f08 lrc: 4/0,0 mode: PR/PR res: [0x200000400:0x5:0x0].0x0 bits 0x13/0x0 rrc: 881 type: IBT flags: 0x60200400000020 nid: 10.9.114.5@o2ib4 remote: 0x57d1e8f824f5fc6d expref: 83737 pid: 101687 timeout: 854579 lvb_type: 0 May 08 08:26:09 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.9.114.5@o2ib4 was evicted due to a lock blocking callback time out: rc -110 May 08 08:26:09 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message May 08 08:26:09 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.9.114.5@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff98356ba67980/0xce8853bae0ff6f08 lrc: 3/0,0 mode: PR/PR res: [0x200000400:0x5:0x0].0x0 bits 0x13/0x0 rrc: 881 type: IBT flags: 0x60200400000020 nid: 10.9.114.5@o2ib4 remote: 0x57d1e8f824f5fc6d expref: 83738 pid: 101687 timeout: 0 lvb_type: 0 May 08 08:26:12 fir-md1-s1 kernel: LustreError: 88049:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557329082, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff9851f136ca40/0xce8853bb1b0823f3 lrc: 3/1,0 mode: --/PR res: [0x200000400:0x5:0x0].0x0 bits 0x13/0x0 rrc: 881 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 88049 timeout: 0 lvb_type: 0 May 08 08:26:12 fir-md1-s1 kernel: LustreError: 88049:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 30 previous similar messages May 08 08:26:22 fir-md1-s1 kernel: Lustre: 88062:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:16s); client may timeout. req@ffff985c82e41200 x1631605310090528/t0(0) o101->5c4c5b6a-001d-e26d-f4d4-23e598bc49a5@10.9.103.13@o2ib4:6/0 lens 576/0 e 0 to 0 dl 1557329166 ref 1 fl Interpret:/2/ffffffff rc 0/-1 May 08 08:26:22 fir-md1-s1 kernel: LustreError: 88048:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.108.54@o2ib4: deadline 30:1s ago req@ffff98519ea64e00 x1632822551552064/t0(0) o101->dcf713a7-e466-63ba-d7fd-f8bd360f2bc9@10.9.108.54@o2ib4:21/0 lens 576/0 e 0 to 0 dl 1557329181 ref 1 fl Interpret:/2/ffffffff rc 0/-1 May 08 08:26:22 fir-md1-s1 kernel: LustreError: 88048:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 18 previous similar messages May 08 08:26:22 fir-md1-s1 kernel: Lustre: 88062:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1854 previous similar messages May 08 08:27:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 47468953-81c7-2310-9c7c-4b0493347857 (at 10.9.114.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984be0f6c000, cur 1557329238 expire 1557329088 last 1557329011 May 08 08:36:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.114.5@o2ib4) May 08 08:36:36 fir-md1-s1 kernel: Lustre: Skipped 340 previous similar messages May 08 08:49:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.23.9@o2ib6) May 08 08:49:10 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 08 09:06:23 fir-md1-s1 kernel: LNetError: 101318:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 08 09:15:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 0799f8ec-71a3-852c-92f0-8f09c24981f0 (at 10.9.114.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff98344b24cc00, cur 1557332109 expire 1557331959 last 1557331882 May 08 09:15:09 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 08 09:15:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 0799f8ec-71a3-852c-92f0-8f09c24981f0 (at 10.9.114.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985c6966a800, cur 1557332115 expire 1557331965 last 1557331888 May 08 09:15:15 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 08 09:24:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.114.5@o2ib4) May 08 09:24:37 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 08 09:37:07 fir-md1-s1 kernel: Lustre: 102394:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557333420/real 1557333420] req@ffff982cc8be5400 x1632397468184352/t0(0) o104->fir-MDT0002@10.9.114.5@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1557333427 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 08 09:37:15 fir-md1-s1 kernel: Lustre: 88061:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9851493d2400 x1631537428073632/t0(0) o101->aa0da401-4118-f760-b185-8e18141c9428@10.9.104.36@o2ib4:20/0 lens 592/3264 e 1 to 0 dl 1557333440 ref 2 fl Interpret:/0/0 rc 0/0 May 08 09:37:15 fir-md1-s1 kernel: Lustre: 88061:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 397 previous similar messages May 08 09:37:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 284f7498-dddc-7b5c-a1fb-23c91ee501c0 (at 10.8.27.14@o2ib6) reconnecting May 08 09:37:21 fir-md1-s1 kernel: Lustre: Skipped 341 previous similar messages May 08 09:37:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b39fa982-051a-897c-6b07-fb455a7a2cb3 (at 10.8.27.14@o2ib6) May 08 09:37:21 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 08 09:37:28 fir-md1-s1 kernel: Lustre: 102394:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557333441/real 1557333441] req@ffff982cc8be5400 x1632397468184352/t0(0) o104->fir-MDT0002@10.9.114.5@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1557333448 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 08 09:37:28 fir-md1-s1 kernel: Lustre: 102394:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 08 09:37:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 27e13f32-faef-c04b-6cae-c5005c29d4ee (at 10.9.105.13@o2ib4) May 08 09:37:29 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages May 08 09:37:32 fir-md1-s1 kernel: Lustre: 102700:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff982758347b00 x1631537411495632/t0(0) o101->f3bba4e8-9568-4001-6257-88537741a8c9@10.8.29.3@o2ib6:7/0 lens 576/3264 e 1 to 0 dl 1557333457 ref 2 fl Interpret:/0/0 rc 0/0 May 08 09:37:32 fir-md1-s1 kernel: Lustre: 102700:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 44 previous similar messages May 08 09:37:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client f3bba4e8-9568-4001-6257-88537741a8c9 (at 10.8.29.3@o2ib6) reconnecting May 08 09:37:38 fir-md1-s1 kernel: Lustre: Skipped 41 previous similar messages May 08 09:37:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 4d03dd65-278a-25e1-de8a-f4877d488a5e (at 10.8.17.17@o2ib6) May 08 09:37:46 fir-md1-s1 kernel: Lustre: Skipped 49 previous similar messages May 08 09:38:03 fir-md1-s1 kernel: Lustre: 102394:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557333476/real 1557333476] req@ffff982cc8be5400 x1632397468184352/t0(0) o104->fir-MDT0002@10.9.114.5@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1557333483 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 08 09:38:03 fir-md1-s1 kernel: Lustre: 102394:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages May 08 09:38:04 fir-md1-s1 kernel: Lustre: 102512:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff983187374e00 x1631551943465200/t0(0) o101->62f2b488-84b8-fb8d-fa79-5fe434be1423@10.8.1.35@o2ib6:9/0 lens 576/3264 e 0 to 0 dl 1557333489 ref 2 fl Interpret:/0/0 rc 0/0 May 08 09:38:04 fir-md1-s1 kernel: Lustre: 102512:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 59 previous similar messages May 08 09:38:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 5d6a1f91-8e73-2d03-02f2-a06db17ace68 (at 10.9.104.33@o2ib4) reconnecting May 08 09:38:10 fir-md1-s1 kernel: Lustre: Skipped 139 previous similar messages May 08 09:38:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.8.28.9@o2ib6) May 08 09:38:18 fir-md1-s1 kernel: Lustre: Skipped 138 previous similar messages May 08 09:38:30 fir-md1-s1 kernel: LustreError: 88062:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557333420, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff985c73e05c40/0xce8853bb7e2b72f1 lrc: 3/1,0 mode: --/PR res: [0x2c00128dc:0x315a:0x0].0x0 bits 0x13/0x0 rrc: 917 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 88062 timeout: 0 lvb_type: 0 May 08 09:38:30 fir-md1-s1 kernel: LustreError: 88062:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 6 previous similar messages May 08 09:38:38 fir-md1-s1 kernel: LustreError: 101684:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557333428, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff98220be9f980/0xce8853bb7e52e8ae lrc: 3/1,0 mode: --/PR res: [0x2c00128dc:0x315a:0x0].0x0 bits 0x13/0x0 rrc: 937 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 101684 timeout: 0 lvb_type: 0 May 08 09:38:38 fir-md1-s1 kernel: LustreError: 101684:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 35 previous similar messages May 08 09:38:54 fir-md1-s1 kernel: LustreError: 102673:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557333444, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff983318705a00/0xce8853bb7e99f0a8 lrc: 3/1,0 mode: --/PR res: [0x2c00128dc:0x315a:0x0].0x0 bits 0x13/0x0 rrc: 974 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 102673 timeout: 0 lvb_type: 0 May 08 09:38:54 fir-md1-s1 kernel: LustreError: 102673:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 36 previous similar messages May 08 09:39:08 fir-md1-s1 kernel: Lustre: 102672:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557333541/real 1557333541] req@ffff9822d34a6600 x1632397500190752/t0(0) o106->fir-MDT0002@10.9.114.5@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1557333548 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 08 09:39:08 fir-md1-s1 kernel: Lustre: 102672:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 11 previous similar messages May 08 09:39:08 fir-md1-s1 kernel: Lustre: 102441:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-6), not sending early reply req@ffff98316b6ba400 x1632890829367888/t0(0) o101->1eab6938-775c-a2cf-ec8a-88384d12d9c6@10.8.9.9@o2ib6:13/0 lens 576/3264 e 0 to 0 dl 1557333553 ref 2 fl Interpret:/0/0 rc 0/0 May 08 09:39:08 fir-md1-s1 kernel: Lustre: 102441:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 125 previous similar messages May 08 09:39:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 7bc73a34-3e7e-c2f0-81f2-d0da70be75c4 (at 10.9.108.26@o2ib4) reconnecting May 08 09:39:15 fir-md1-s1 kernel: Lustre: Skipped 454 previous similar messages May 08 09:39:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 96532258-2442-6bbe-2470-71329f1744db (at 10.9.105.65@o2ib4) May 08 09:39:23 fir-md1-s1 kernel: Lustre: Skipped 465 previous similar messages May 08 09:39:27 fir-md1-s1 kernel: LustreError: 102478:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557333477, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff98228ceb8fc0/0xce8853bb7f2725e3 lrc: 3/1,0 mode: --/PR res: [0x2c00128dc:0x315a:0x0].0x0 bits 0x13/0x0 rrc: 1050 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 102478 timeout: 0 lvb_type: 0 May 08 09:39:27 fir-md1-s1 kernel: LustreError: 102478:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 70 previous similar messages May 08 09:39:34 fir-md1-s1 kernel: LustreError: 102394:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.9.114.5@o2ib4) failed to reply to blocking AST (req@ffff982cc8be5400 x1632397468184352 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff984c978a9440/0xce8853bb7cec2597 lrc: 4/0,0 mode: PR/PR res: [0x2c00128dc:0x315a:0x0].0x0 bits 0x13/0x0 rrc: 1061 type: IBT flags: 0x60200400000020 nid: 10.9.114.5@o2ib4 remote: 0xf7af65eb2474dc8d expref: 848 pid: 102633 timeout: 858988 lvb_type: 0 May 08 09:39:34 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.9.114.5@o2ib4 was evicted due to a lock blocking callback time out: rc -110 May 08 09:39:34 fir-md1-s1 kernel: LustreError: 101500:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.9.114.5@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff984c978a9440/0xce8853bb7cec2597 lrc: 3/0,0 mode: PR/PR res: [0x2c00128dc:0x315a:0x0].0x0 bits 0x13/0x0 rrc: 1061 type: IBT flags: 0x60200400000020 nid: 10.9.114.5@o2ib4 remote: 0xf7af65eb2474dc8d expref: 849 pid: 102633 timeout: 0 lvb_type: 0 May 08 09:39:34 fir-md1-s1 kernel: Lustre: 102477:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (92:1s); client may timeout. req@ffff981d19fe5400 x1631535505240752/t0(0) o101->205cae49-b70d-4635-7302-5f62b1c05bbe@10.9.102.18@o2ib4:1/0 lens 576/536 e 0 to 0 dl 1557333573 ref 1 fl Complete:/0/0 rc 0/0 May 08 09:39:34 fir-md1-s1 kernel: Lustre: 102477:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 3 previous similar messages May 08 09:40:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 6843ac6f-804b-c34b-714d-77bb2b66baeb (at 10.9.114.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9851b6fa5c00, cur 1557333635 expire 1557333485 last 1557333408 May 08 09:49:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.114.5@o2ib4) May 08 09:49:22 fir-md1-s1 kernel: Lustre: Skipped 114 previous similar messages May 08 10:01:27 fir-md1-s1 kernel: Lustre: 102599:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 May 08 10:05:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 8c4b439a-dcfe-1ea3-7954-703f216b7103 (at 10.9.114.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984bfc6ff000, cur 1557335145 expire 1557334995 last 1557334918 May 08 10:05:45 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 08 10:10:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.114.5@o2ib4) May 08 10:10:52 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 08 10:30:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client c6da3458-72ef-a673-d8a4-96c49b3c1790 (at 10.8.1.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985bba251400, cur 1557336638 expire 1557336488 last 1557336411 May 08 10:30:38 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 08 10:31:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a9f925c3-c6cc-1827-f5ca-23aeee30bd63 (at 10.8.23.14@o2ib6) May 08 10:31:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 08 10:33:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d9da3e15-6409-b80e-0818-87f0f6b459fb (at 10.9.108.43@o2ib4) May 08 10:33:42 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 08 10:37:39 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.109.7@o2ib4) May 08 10:37:39 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 08 10:38:12 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 17877c6e-0afe-2eb8-317c-b22f20566377 (at 10.9.108.4@o2ib4) May 08 10:38:12 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 08 10:39:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 66e1a9ab-9220-f034-4a7b-9cb08b8b8802 (at 10.9.103.40@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982ca491f000, cur 1557337199 expire 1557337049 last 1557336972 May 08 10:39:59 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages May 08 10:45:21 fir-md1-s1 kernel: Lustre: 102514:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557337514/real 1557337514] req@ffff982bfd724500 x1632398578402368/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557337521 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 08 10:45:21 fir-md1-s1 kernel: Lustre: 102514:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 7 previous similar messages May 08 10:45:30 fir-md1-s1 kernel: Lustre: 102739:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9820d5276300 x1632106828640320/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:4/0 lens 480/568 e 1 to 0 dl 1557337534 ref 2 fl Interpret:/0/0 rc 0/0 May 08 10:45:30 fir-md1-s1 kernel: Lustre: 102739:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 21 previous similar messages May 08 10:45:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 08 10:45:35 fir-md1-s1 kernel: Lustre: Skipped 161 previous similar messages May 08 10:45:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 08 10:45:35 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 08 10:45:42 fir-md1-s1 kernel: Lustre: 102514:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557337535/real 1557337535] req@ffff982bfd724500 x1632398578402368/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557337542 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 08 10:45:42 fir-md1-s1 kernel: Lustre: 102514:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 08 10:45:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 08 10:46:18 fir-md1-s1 kernel: Lustre: 102514:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557337570/real 1557337570] req@ffff982bfd724500 x1632398578402368/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557337577 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 08 10:46:18 fir-md1-s1 kernel: Lustre: 102514:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages May 08 10:46:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 08 10:46:38 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 08 10:46:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.0.10.3@o2ib7) May 08 10:46:59 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages May 08 10:47:28 fir-md1-s1 kernel: Lustre: 102514:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557337641/real 1557337641] req@ffff982bfd724500 x1632398578402368/t0(0) o106->fir-MDT0002@10.8.23.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557337648 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 08 10:47:28 fir-md1-s1 kernel: Lustre: 102514:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages May 08 10:48:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 08 10:48:02 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages May 08 10:48:35 fir-md1-s1 kernel: LNet: Service thread pid 102514 was inactive for 200.48s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: May 08 10:48:35 fir-md1-s1 kernel: Pid: 102514, comm: mdt00_052 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 May 08 10:48:35 fir-md1-s1 kernel: Call Trace: May 08 10:48:35 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x500/0x8d0 [ptlrpc] May 08 10:48:35 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] May 08 10:48:35 fir-md1-s1 kernel: [] ldlm_glimpse_locks+0x3b/0x100 [ptlrpc] May 08 10:48:35 fir-md1-s1 kernel: [] mdt_do_glimpse+0x1e9/0x4c0 [mdt] May 08 10:48:35 fir-md1-s1 kernel: [] mdt_glimpse_enqueue+0x3d3/0x4f0 [mdt] May 08 10:48:35 fir-md1-s1 kernel: [] mdt_intent_glimpse+0x1f/0x30 [mdt] May 08 10:48:35 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] May 08 10:48:35 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] May 08 10:48:35 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] May 08 10:48:35 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] May 08 10:48:35 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] May 08 10:48:35 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 08 10:48:35 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] May 08 10:48:35 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 May 08 10:48:35 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 May 08 10:48:35 fir-md1-s1 kernel: [] 0xffffffffffffffff May 08 10:48:35 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1557337715.102514 May 08 10:48:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 6438e03d-82a5-e48c-d12e-59dbfd034770 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983610e54c00, cur 1557337724 expire 1557337574 last 1557337497 May 08 10:48:44 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 08 10:48:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6438e03d-82a5-e48c-d12e-59dbfd034770 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9826d2613400, cur 1557337726 expire 1557337576 last 1557337499 May 08 10:48:46 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 08 10:48:46 fir-md1-s1 kernel: LNet: Service thread pid 102514 completed after 211.11s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). May 08 10:52:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 3307997c-545e-5ef6-bb14-9c53dab7feb6 (at 10.9.105.3@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982ce8f0b800, cur 1557337926 expire 1557337776 last 1557337699 May 08 10:53:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 81ba03cb-100f-75de-7e7c-84a709257c80 (at 10.9.108.7@o2ib4) in 224 seconds. I think it's dead, and I am evicting it. exp ffff984cdc8d6000, cur 1557338002 expire 1557337852 last 1557337778 May 08 10:53:22 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 08 10:56:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 76f46682-dce3-bb75-00b0-3db97b46fa9d (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984889e77c00, cur 1557338201 expire 1557338051 last 1557337974 May 08 10:56:41 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages May 08 10:56:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a9f925c3-c6cc-1827-f5ca-23aeee30bd63 (at 10.8.23.14@o2ib6) May 08 10:56:59 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages May 08 11:01:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.1.4@o2ib6) May 08 11:01:21 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 08 11:07:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 7a320afd-28b5-793f-1e06-c4366476cc67 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff983be8eca800, cur 1557338850 expire 1557338700 last 1557338623 May 08 11:07:30 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 08 11:09:35 fir-md1-s1 kernel: Lustre: 102363:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 08 11:15:09 fir-md1-s1 kernel: Lustre: 102722:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 May 08 11:17:12 fir-md1-s1 kernel: Lustre: 102764:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 08 11:18:03 fir-md1-s1 kernel: Lustre: 102710:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 08 11:18:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d961e6dd-82a1-6a6a-5a23-8556e5335ccd (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff982f10297400, cur 1557339526 expire 1557339376 last 1557339299 May 08 11:18:46 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 08 11:19:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a9f925c3-c6cc-1827-f5ca-23aeee30bd63 (at 10.8.23.14@o2ib6) May 08 11:19:17 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages May 08 11:22:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client dcabd621-66cd-08f1-3eb7-174dbf08b0d9 (at 10.8.1.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984cb871f400, cur 1557339747 expire 1557339597 last 1557339520 May 08 11:22:27 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 08 11:23:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 98099f76-bead-25a3-00a0-0c1f8bda0881 (at 10.9.106.27@o2ib4) in 196 seconds. I think it's dead, and I am evicting it. exp ffff985c16a4c000, cur 1557339823 expire 1557339673 last 1557339627 May 08 11:23:43 fir-md1-s1 kernel: Lustre: Skipped 47 previous similar messages May 08 11:23:49 fir-md1-s1 kernel: LNetError: 101320:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 08 11:25:27 fir-md1-s1 kernel: Lustre: 102440:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 08 11:26:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 773c3f88-b988-c40f-0857-260c7cbe8aa4 (at 10.9.101.29@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff985c736d5400, cur 1557339994 expire 1557339844 last 1557339767 May 08 11:26:34 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 08 11:27:22 fir-md1-s1 kernel: Lustre: 102593:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 08 11:31:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 8d951c4e-09d2-9a6e-d7d2-479856ebd844 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984a8326b800, cur 1557340264 expire 1557340114 last 1557340037 May 08 11:31:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 08 11:31:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a9f925c3-c6cc-1827-f5ca-23aeee30bd63 (at 10.8.23.14@o2ib6) May 08 11:31:38 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages May 08 11:32:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.23.14@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. May 08 11:32:06 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message May 08 11:32:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e3964c02-b0e5-74a3-d6d2-94acff5fcdaa (at 10.9.105.3@o2ib4) in 187 seconds. I think it's dead, and I am evicting it. exp ffff98497deda000, cur 1557340340 expire 1557340190 last 1557340153 May 08 11:32:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages May 08 11:32:55 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.23.14@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. May 08 11:32:55 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message May 08 11:33:00 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 845c42a7-5a9d-5e58-6a3c-3a68da731ba8 (at 10.9.105.3@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984a5aeb4000, cur 1557340380 expire 1557340230 last 1557340153 May 08 11:33:00 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message May 08 11:33:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.108.24@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. May 08 11:33:13 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message May 08 11:34:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.108.49@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. May 08 11:35:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 52dc2b20-3bed-6e90-ce23-8012b240d434 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9831f47ab400, cur 1557340533 expire 1557340383 last 1557340306 May 08 11:35:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.108.49@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. May 08 11:37:18 fir-md1-s1 kernel: Lustre: 102527:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 08 11:37:18 fir-md1-s1 kernel: Lustre: 102527:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message May 08 11:38:23 fir-md1-s1 kernel: Lustre: 102532:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 08 11:39:27 fir-md1-s1 kernel: Lustre: 102435:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 May 08 11:39:27 fir-md1-s1 kernel: Lustre: 102435:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 24 previous similar messages May 08 11:41:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client c09cfba5-6ff8-041f-f460-5b8951b8da80 (at 10.9.103.43@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff984700637000, cur 1557340885 expire 1557340735 last 1557340658 May 08 11:41:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages