Apr 26 06:56:02 fir-md1-s1 kernel: LDISKFS-fs (dm-3): file extents enabled, maximum tree depth=5 Apr 26 06:56:02 fir-md1-s1 kernel: LDISKFS-fs (dm-4): file extents enabled, maximum tree depth=5 Apr 26 06:56:02 fir-md1-s1 kernel: LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc Apr 26 06:56:02 fir-md1-s1 kernel: LDISKFS-fs (dm-4): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc Apr 26 06:56:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.8.1.14@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 26 06:56:02 fir-md1-s1 kernel: LustreError: Skipped 2483 previous similar messages Apr 26 06:56:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Not available for connect from 10.8.12.20@o2ib6 (not set up) Apr 26 06:56:02 fir-md1-s1 kernel: Lustre: Skipped 457 previous similar messages Apr 26 06:56:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 Apr 26 06:56:05 fir-md1-s1 kernel: Lustre: fir-MDD0002: changelog on Apr 26 06:56:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: in recovery but waiting for the first client to connect Apr 26 06:56:05 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 26 06:56:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Will be in recovery for at least 2:30, or until 1328 clients reconnect Apr 26 06:56:05 fir-md1-s1 kernel: LustreError: 11-0: fir-MDT0002-osp-MDT0000: operation mds_connect to node 0@lo failed: rc = -114 Apr 26 06:56:05 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Apr 26 06:56:06 fir-md1-s1 kernel: Lustre: fir-MDD0000: changelog on Apr 26 06:56:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: in recovery but waiting for the first client to connect Apr 26 06:56:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: Will be in recovery for at least 2:30, or until 1328 clients reconnect Apr 26 06:56:30 fir-md1-s1 kernel: LustreError: 104340:0:(tgt_handler.c:525:tgt_filter_recovery_request()) @@@ not permitted during recovery req@ffff8b3bf4a60c50 x1631588174408256/t0(0) o601->fir-MDT0000-lwp-OST0022_UUID@10.0.10.105@o2ib7:0/0 lens 336/0 e 0 to 0 dl 1556287020 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 26 06:56:30 fir-md1-s1 kernel: LustreError: 104340:0:(tgt_handler.c:525:tgt_filter_recovery_request()) Skipped 4 previous similar messages Apr 26 06:56:31 fir-md1-s1 kernel: LustreError: 104751:0:(tgt_handler.c:525:tgt_filter_recovery_request()) @@@ not permitted during recovery req@ffff8b41d57dcb00 x1631588174426464/t0(0) o601->fir-MDT0000-lwp-OST001a_UUID@10.0.10.105@o2ib7:1/0 lens 336/0 e 0 to 0 dl 1556287021 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 26 06:56:31 fir-md1-s1 kernel: LustreError: 104751:0:(tgt_handler.c:525:tgt_filter_recovery_request()) Skipped 64 previous similar messages Apr 26 06:56:32 fir-md1-s1 kernel: LustreError: 104749:0:(tgt_handler.c:525:tgt_filter_recovery_request()) @@@ not permitted during recovery req@ffff8b3c17da3f00 x1631588174436576/t0(0) o601->fir-MDT0000-lwp-OST0020_UUID@10.0.10.105@o2ib7:2/0 lens 336/0 e 0 to 0 dl 1556287022 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 26 06:56:32 fir-md1-s1 kernel: LustreError: 104749:0:(tgt_handler.c:525:tgt_filter_recovery_request()) Skipped 41 previous similar messages Apr 26 06:56:35 fir-md1-s1 kernel: LustreError: 104342:0:(tgt_handler.c:525:tgt_filter_recovery_request()) @@@ not permitted during recovery req@ffff8b46f0f44e00 x1631588149599456/t0(0) o601->fir-MDT0000-lwp-OST0015_UUID@10.0.10.104@o2ib7:5/0 lens 336/0 e 0 to 0 dl 1556287025 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 26 06:56:35 fir-md1-s1 kernel: LustreError: 104342:0:(tgt_handler.c:525:tgt_filter_recovery_request()) Skipped 32 previous similar messages Apr 26 06:56:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.105.50@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 26 06:56:37 fir-md1-s1 kernel: LustreError: Skipped 983 previous similar messages Apr 26 06:56:39 fir-md1-s1 kernel: LustreError: 104340:0:(tgt_handler.c:525:tgt_filter_recovery_request()) @@@ not permitted during recovery req@ffff8b33596a5100 x1631588174449488/t0(0) o601->fir-MDT0000-lwp-OST001e_UUID@10.0.10.105@o2ib7:9/0 lens 336/0 e 0 to 0 dl 1556287029 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 26 06:56:39 fir-md1-s1 kernel: LustreError: 104340:0:(tgt_handler.c:525:tgt_filter_recovery_request()) Skipped 125 previous similar messages Apr 26 06:56:47 fir-md1-s1 kernel: LustreError: 104756:0:(tgt_handler.c:525:tgt_filter_recovery_request()) @@@ not permitted during recovery req@ffff8b4907dee050 x1631588149645152/t0(0) o601->fir-MDT0000-lwp-OST0011_UUID@10.0.10.104@o2ib7:17/0 lens 336/0 e 0 to 0 dl 1556287037 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 26 06:56:47 fir-md1-s1 kernel: LustreError: 104756:0:(tgt_handler.c:525:tgt_filter_recovery_request()) Skipped 513 previous similar messages Apr 26 06:57:01 fir-md1-s1 kernel: LNetError: 20271:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 0 seconds Apr 26 06:57:01 fir-md1-s1 kernel: LNetError: 20271:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.52@o2ib7 (5): c: 0, oc: 0, rc: 8 Apr 26 06:57:20 fir-md1-s1 kernel: LustreError: 104343:0:(tgt_handler.c:525:tgt_filter_recovery_request()) @@@ not permitted during recovery req@ffff8b550b6e0000 x1631588200682672/t0(0) o601->fir-MDT0000-lwp-OST001d_UUID@10.0.10.106@o2ib7:20/0 lens 336/0 e 0 to 0 dl 1556287070 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 26 06:57:20 fir-md1-s1 kernel: LustreError: 104343:0:(tgt_handler.c:525:tgt_filter_recovery_request()) Skipped 83 previous similar messages Apr 26 06:57:53 fir-md1-s1 kernel: LustreError: 104345:0:(tgt_handler.c:525:tgt_filter_recovery_request()) @@@ not permitted during recovery req@ffff8b3703302a00 x1631588172769184/t0(0) o601->fir-MDT0000-lwp-OST0012_UUID@10.0.10.103@o2ib7:23/0 lens 336/0 e 0 to 0 dl 1556287103 ref 1 fl Interpret:/0/ffffffff rc 0/-1 Apr 26 06:57:53 fir-md1-s1 kernel: LustreError: 104345:0:(tgt_handler.c:525:tgt_filter_recovery_request()) Skipped 1046 previous similar messages Apr 26 06:57:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Denying connection for new client 53385b7b-a550-b1a8-0abe-3b8ac836eb95(at 10.8.10.20@o2ib6), waiting for 1330 known clients (1303 recovered, 26 in progress, and 0 evicted) already passed deadline 4:20 Apr 26 06:57:59 fir-md1-s1 kernel: Lustre: Skipped 1203 previous similar messages Apr 26 06:58:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: recovery is timed out, evict stale exports Apr 26 06:58:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: disconnecting 1 stale clients Apr 26 06:58:35 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 26 06:58:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Recovery already passed deadline 4:00, It is most likely due to DNE recovery is failed or stuck, please wait a few more minutes or abort the recovery. Apr 26 06:58:35 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Apr 26 06:58:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Recovery over after 2:31, of 1330 clients 1329 recovered and 1 was evicted. Apr 26 06:58:36 fir-md1-s1 kernel: LustreError: 11-0: fir-MDT0000-lwp-MDT0002: operation quota_acquire to node 0@lo failed: rc = -11 Apr 26 06:58:36 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Apr 26 06:58:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: recovery is timed out, evict stale exports Apr 26 06:58:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: disconnecting 1 stale clients Apr 26 06:58:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: Recovery over after 2:30, of 1330 clients 1329 recovered and 1 was evicted. Apr 26 06:59:03 fir-md1-s1 kernel: Lustre: 105025:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b46ea76b600 x1631580609838592/t0(0) o101->dbf04e10-b1d9-69b3-7d7f-0787d71bee95@10.9.115.8@o2ib4:8/0 lens 576/3264 e 0 to 0 dl 1556287148 ref 2 fl Interpret:/0/0 rc 0/0 Apr 26 06:59:03 fir-md1-s1 kernel: Lustre: 105025:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 18 previous similar messages Apr 26 06:59:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 2e220de5-7b5b-3874-45ea-64c959a50d0b (at 10.8.0.67@o2ib6) reconnecting Apr 26 06:59:09 fir-md1-s1 kernel: Lustre: Skipped 64 previous similar messages Apr 26 06:59:29 fir-md1-s1 kernel: Lustre: 105233:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b62997a5d00 x1631534835512416/t0(0) o101->b9df03f5-d7de-55e7-26be-b7cb233fd358@10.9.115.9@o2ib4:4/0 lens 576/3264 e 0 to 0 dl 1556287174 ref 2 fl Interpret:/0/0 rc 0/0 Apr 26 06:59:29 fir-md1-s1 kernel: Lustre: 105233:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 18 previous similar messages Apr 26 06:59:40 fir-md1-s1 kernel: Lustre: 105080:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (61:1s); client may timeout. req@ffff8b73125e2450 x1631526698584208/t0(0) o101->0b49eccd-cda4-7bac-8560-4f28415786a3@10.9.0.62@o2ib4:8/0 lens 576/536 e 0 to 0 dl 1556287179 ref 1 fl Complete:/0/0 rc 0/0 Apr 26 06:59:40 fir-md1-s1 kernel: Lustre: 105080:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 51 previous similar messages Apr 26 07:07:20 fir-md1-s1 kernel: Lustre: 105261:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b396bb43300 x1631814741998128/t0(0) o101->718e2070-f49a-08c6-62d7-7d002d5b938d@10.9.104.69@o2ib4:25/0 lens 480/568 e 0 to 0 dl 1556287645 ref 2 fl Interpret:/0/0 rc 0/0 Apr 26 07:07:20 fir-md1-s1 kernel: Lustre: 105261:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 6 previous similar messages Apr 26 07:07:25 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.9.104.69@o2ib4 ns: mdt-fir-MDT0002_UUID lock: ffff8b52939c8240/0x378007f5d69eed2f lrc: 3/0,0 mode: PW/PW res: [0x2c001c08e:0xb5:0x0].0x0 bits 0x40/0x0 rrc: 16 type: IBT flags: 0x60200400000020 nid: 10.9.104.69@o2ib4 remote: 0xcc66d4e7effd9d63 expref: 897 pid: 104977 timeout: 287479 lvb_type: 0 Apr 26 07:07:25 fir-md1-s1 kernel: LustreError: 105423:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b7317fbfc00 ns: mdt-fir-MDT0002_UUID lock: ffff8b4ebd246c00/0x378007f5d69eed7c lrc: 3/0,0 mode: PW/PW res: [0x2c001c08e:0xb5:0x0].0x0 bits 0x40/0x0 rrc: 14 type: IBT flags: 0x50200400000020 nid: 10.9.104.69@o2ib4 remote: 0xcc66d4e7effd9d6a expref: 687 pid: 105423 timeout: 0 lvb_type: 0 Apr 26 07:07:25 fir-md1-s1 kernel: LustreError: 105423:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 18 previous similar messages Apr 26 07:07:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 1a346bf1-0637-efc4-4ac4-ab413abbac97 (at 10.9.104.69@o2ib4) Apr 26 07:07:25 fir-md1-s1 kernel: Lustre: Skipped 2831 previous similar messages Apr 26 07:28:54 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 4a8483c9-7659-178a-844e-603cf14e1ffe (at 10.8.15.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b5339446800, cur 1556288934 expire 1556288784 last 1556288707 Apr 26 07:29:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 96991bbd-d088-ba9b-da44-41ffcb01dcb5 (at 10.8.15.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b70eb251c00, cur 1556288950 expire 1556288800 last 1556288723 Apr 26 07:29:10 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 26 07:39:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 587b7394-1ab0-a3d7-438e-97f0db3b2525 (at 10.8.26.8@o2ib6) Apr 26 07:40:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client c0a509a5-2c41-578e-21fb-9d567e5f805f (at 10.8.26.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b71e93d1000, cur 1556289639 expire 1556289489 last 1556289412 Apr 26 07:40:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client c0a509a5-2c41-578e-21fb-9d567e5f805f (at 10.8.26.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b72f61bb000, cur 1556289642 expire 1556289492 last 1556289415 Apr 26 07:50:54 fir-md1-s1 kernel: Lustre: 105380:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556290247/real 1556290247] req@ffff8b731b5e7200 x1631588819617408/t0(0) o104->fir-MDT0002@10.8.9.8@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556290254 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Apr 26 07:51:02 fir-md1-s1 kernel: Lustre: 105275:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8b62e9666300 x1631535626818656/t0(0) o36->6ea10cfc-48e7-f6e7-b834-4eb6674e3061@10.9.102.48@o2ib4:7/0 lens 528/448 e 1 to 0 dl 1556290267 ref 2 fl Interpret:/0/0 rc 0/0 Apr 26 07:51:08 fir-md1-s1 kernel: Lustre: 105380:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556290261/real 1556290261] req@ffff8b731b5e7200 x1631588819617408/t0(0) o104->fir-MDT0002@10.8.9.8@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556290268 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Apr 26 07:51:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6ea10cfc-48e7-f6e7-b834-4eb6674e3061 (at 10.9.102.48@o2ib4) reconnecting Apr 26 07:51:08 fir-md1-s1 kernel: Lustre: Skipped 36 previous similar messages Apr 26 07:51:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 2805d4d5-3428-11a8-983e-161adb012353 (at 10.9.102.48@o2ib4) Apr 26 07:51:08 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 26 07:51:08 fir-md1-s1 kernel: Lustre: 105380:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Apr 26 07:51:14 fir-md1-s1 kernel: Lustre: 105069:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b36ff26dd00 x1631535771927136/t0(0) o101->2234877f-3fc8-f3e4-bcc7-41174e21aeca@10.9.102.46@o2ib4:19/0 lens 592/3264 e 0 to 0 dl 1556290279 ref 2 fl Interpret:/0/0 rc 0/0 Apr 26 07:51:14 fir-md1-s1 kernel: Lustre: 105069:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 6 previous similar messages Apr 26 07:51:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 6faf06bb-bb97-e5fe-c871-a5b17fa156c1 (at 10.9.102.46@o2ib4) Apr 26 07:51:20 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Apr 26 07:51:29 fir-md1-s1 kernel: Lustre: 105380:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556290282/real 1556290282] req@ffff8b731b5e7200 x1631588819617408/t0(0) o104->fir-MDT0002@10.8.9.8@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556290289 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Apr 26 07:51:29 fir-md1-s1 kernel: Lustre: 105380:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Apr 26 07:51:31 fir-md1-s1 kernel: Lustre: 105373:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b7324898600 x1631558236788096/t0(0) o101->609c185f-2f75-dae6-6bc6-ab6021b0793f@10.9.108.59@o2ib4:6/0 lens 592/3264 e 0 to 0 dl 1556290296 ref 2 fl Interpret:/0/0 rc 0/0 Apr 26 07:51:31 fir-md1-s1 kernel: Lustre: 105373:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 9 previous similar messages Apr 26 07:51:36 fir-md1-s1 kernel: LustreError: 105380:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.9.8@o2ib6) returned error from blocking AST (req@ffff8b731b5e7200 x1631588819617408 status -107 rc -107), evict it ns: mdt-fir-MDT0002_UUID lock: ffff8b52ac274a40/0x378007f5e0a85b29 lrc: 4/0,0 mode: PR/PR res: [0x2c00128dc:0x315a:0x0].0x0 bits 0x13/0x0 rrc: 60 type: IBT flags: 0x60200400000020 nid: 10.8.9.8@o2ib6 remote: 0x5840d5a36116f887 expref: 3589 pid: 105038 timeout: 290280 lvb_type: 0 Apr 26 07:51:36 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.9.8@o2ib6 was evicted due to a lock blocking callback time out: rc -107 Apr 26 07:51:36 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 49s: evicting client at 10.8.9.8@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8b52ac274a40/0x378007f5e0a85b29 lrc: 3/0,0 mode: PR/PR res: [0x2c00128dc:0x315a:0x0].0x0 bits 0x13/0x0 rrc: 61 type: IBT flags: 0x60200400000020 nid: 10.8.9.8@o2ib6 remote: 0x5840d5a36116f887 expref: 3590 pid: 105038 timeout: 0 lvb_type: 0 Apr 26 07:52:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b8566a76-ed42-2ee8-d9fd-567ffce8f1d3 (at 10.8.9.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b72fcafe400, cur 1556290343 expire 1556290193 last 1556290116 Apr 26 07:52:23 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 26 08:43:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a219735d-fdd9-8136-e279-19af27191418 (at 10.8.19.8@o2ib6) Apr 26 08:43:04 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Apr 26 08:45:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 95e6fd6a-706d-ff18-fa02-0b0e9d53d014 (at 10.8.19.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b72dca68c00, cur 1556293501 expire 1556293351 last 1556293274 Apr 26 08:45:01 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 26 08:45:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 95e6fd6a-706d-ff18-fa02-0b0e9d53d014 (at 10.8.19.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b72d3672000, cur 1556293516 expire 1556293366 last 1556293289 Apr 26 08:45:16 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 26 08:50:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 820961b3-2494-2624-1872-4e954309f717 (at 10.8.7.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b71f375d800, cur 1556293809 expire 1556293659 last 1556293582 Apr 26 08:51:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 91789224-c4f6-53c6-8b46-442b8faa4cb6 (at 10.8.14.6@o2ib6) Apr 26 08:51:48 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 26 08:51:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 21c7addf-bec1-0d7c-9558-d59859ccf850 (at 10.8.7.23@o2ib6) Apr 26 08:51:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 26 08:52:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 1ea1acc1-3fe4-ba60-b1d5-137c8d4a178d (at 10.8.14.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b728c234000, cur 1556293978 expire 1556293828 last 1556293751 Apr 26 08:52:58 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 26 08:53:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 1ea1acc1-3fe4-ba60-b1d5-137c8d4a178d (at 10.8.14.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b72d2a37c00, cur 1556293997 expire 1556293847 last 1556293770 Apr 26 08:53:17 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 26 08:54:41 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 69c893af-e2f4-d998-db4b-a19eba715446 (at 10.8.11.14@o2ib6) Apr 26 08:54:41 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Apr 26 09:00:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client d4240347-d03f-94ef-97f8-f6139e140ab0 (at 10.8.20.5@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b72f515cc00, cur 1556294428 expire 1556294278 last 1556294201 Apr 26 09:22:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a151990e-a5e1-122d-0844-9f2f75fd2d4b (at 10.9.106.14@o2ib4) Apr 26 09:22:22 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 26 09:27:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to bbe50cfc-346c-ee8a-5e16-7b911dabb1f0 (at 10.8.14.5@o2ib6) Apr 26 09:27:21 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 26 09:27:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 5a57b96c-30f5-29fc-f45d-3155572619c2 (at 10.8.14.9@o2ib6) Apr 26 09:27:33 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 26 09:29:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 829c3902-0e40-54c2-02f4-67a2a5ac7775 (at 10.8.20.5@o2ib6) Apr 26 09:29:00 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 26 09:30:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 55a9f40b-1d78-32c6-9848-4dd63314b656 (at 10.9.102.57@o2ib4) Apr 26 09:30:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 26 09:35:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d16504c2-48e2-66bb-8872-c6000ccd4b69 (at 10.8.26.28@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b72bf2c1400, cur 1556296516 expire 1556296366 last 1556296289 Apr 26 09:35:16 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 26 09:35:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.26.28@o2ib6) Apr 26 09:35:17 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 26 09:35:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client d16504c2-48e2-66bb-8872-c6000ccd4b69 (at 10.8.26.28@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b724ba20c00, cur 1556296522 expire 1556296372 last 1556296295 Apr 26 09:35:22 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 26 10:14:30 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client ac8f395a-5784-3976-d652-54140d304414 (at 10.8.0.66@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b4339f02c00, cur 1556298870 expire 1556298720 last 1556298643 Apr 26 10:14:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client c4f3aa61-62f8-6b68-bd9d-2e7b54f4f422 (at 10.8.0.66@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b72fcaff400, cur 1556298883 expire 1556298733 last 1556298656 Apr 26 10:14:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client c4f3aa61-62f8-6b68-bd9d-2e7b54f4f422 (at 10.8.0.66@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b70eb256c00, cur 1556298887 expire 1556298737 last 1556298660 Apr 26 10:17:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ac8f395a-5784-3976-d652-54140d304414 (at 10.8.0.66@o2ib6) Apr 26 10:17:34 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 26 10:19:22 fir-md1-s1 kernel: Lustre: 104331:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556299155/real 1556299155] req@ffff8b5233ea7200 x1631589118256640/t0(0) o104->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556299162 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Apr 26 10:19:22 fir-md1-s1 kernel: Lustre: 104331:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Apr 26 10:19:29 fir-md1-s1 kernel: Lustre: 104331:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556299162/real 1556299162] req@ffff8b5233ea7200 x1631589118256640/t0(0) o104->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556299169 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Apr 26 10:19:30 fir-md1-s1 kernel: Lustre: 104963:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8b7034750f00 x1631546064139632/t0(0) o101->c6274fea-902b-f634-b05d-2d475f88b926@10.9.104.71@o2ib4:5/0 lens 576/3264 e 1 to 0 dl 1556299175 ref 2 fl Interpret:/0/0 rc 0/0 Apr 26 10:19:30 fir-md1-s1 kernel: Lustre: 104963:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 6 previous similar messages Apr 26 10:19:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client a672d11f-a495-3615-fbb4-49f37049a724 (at 10.9.102.37@o2ib4) reconnecting Apr 26 10:19:36 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Apr 26 10:19:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b07835a9-40e7-35c8-213d-a395caed4e49 (at 10.9.102.37@o2ib4) Apr 26 10:19:36 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 26 10:19:39 fir-md1-s1 kernel: Lustre: 104692:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8b52397e4500 x1631558647472736/t0(0) o101->b037e677-44d7-aaa3-7fcb-f03882ad0cd7@10.9.105.23@o2ib4:14/0 lens 576/3264 e 1 to 0 dl 1556299184 ref 2 fl Interpret:/0/0 rc 0/0 Apr 26 10:19:39 fir-md1-s1 kernel: Lustre: 104692:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Apr 26 10:19:43 fir-md1-s1 kernel: Lustre: 104331:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556299176/real 1556299176] req@ffff8b5233ea7200 x1631589118256640/t0(0) o104->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556299183 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Apr 26 10:19:43 fir-md1-s1 kernel: Lustre: 104331:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Apr 26 10:19:49 fir-md1-s1 kernel: Lustre: 104355:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8b3530754e00 x1631546495683088/t0(0) o101->64c1669e-ae91-e030-4a6b-557cfa991c99@10.9.102.22@o2ib4:24/0 lens 576/3264 e 1 to 0 dl 1556299194 ref 2 fl Interpret:/0/0 rc 0/0 Apr 26 10:19:49 fir-md1-s1 kernel: Lustre: 104355:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Apr 26 10:19:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 02ff72b7-013c-f5b2-3098-e3501810341b (at 10.8.8.6@o2ib6) reconnecting Apr 26 10:19:50 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Apr 26 10:19:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to caf73e48-17c6-a69d-ea08-d331ca6690f7 (at 10.9.102.22@o2ib4) Apr 26 10:19:55 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Apr 26 10:20:04 fir-md1-s1 kernel: Lustre: 104331:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556299197/real 1556299197] req@ffff8b5233ea7200 x1631589118256640/t0(0) o104->fir-MDT0002@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556299204 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Apr 26 10:20:04 fir-md1-s1 kernel: Lustre: 104331:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Apr 26 10:20:07 fir-md1-s1 kernel: Lustre: 105005:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b4941b28c00 x1631361966945120/t0(0) o101->ae1c0bb0-8745-5d9b-0fca-9b849d57aa4e@10.8.9.10@o2ib6:12/0 lens 576/3264 e 0 to 0 dl 1556299212 ref 2 fl Interpret:/0/0 rc 0/0 Apr 26 10:20:07 fir-md1-s1 kernel: Lustre: 105005:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 10 previous similar messages Apr 26 10:20:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 75c31e1e-77de-1d06-3ba1-5bf70911b79e (at 10.9.104.58@o2ib4) reconnecting Apr 26 10:20:09 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Apr 26 10:20:32 fir-md1-s1 kernel: LustreError: 104331:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.27.23@o2ib6) returned error from blocking AST (req@ffff8b5233ea7200 x1631589118256640 status -107 rc -107), evict it ns: mdt-fir-MDT0002_UUID lock: ffff8b333c472ac0/0x378007f6080d279b lrc: 4/0,0 mode: PR/PR res: [0x2c00128dc:0x315a:0x0].0x0 bits 0x13/0x0 rrc: 309 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0xc25d38cd7300d1bc expref: 264 pid: 105065 timeout: 299216 lvb_type: 0 Apr 26 10:20:32 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.27.23@o2ib6 was evicted due to a lock blocking callback time out: rc -107 Apr 26 10:20:32 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 77s: evicting client at 10.8.27.23@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff8b333c472ac0/0x378007f6080d279b lrc: 3/0,0 mode: PR/PR res: [0x2c00128dc:0x315a:0x0].0x0 bits 0x13/0x0 rrc: 311 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0xc25d38cd7300d1bc expref: 265 pid: 105065 timeout: 0 lvb_type: 0 Apr 26 10:20:32 fir-md1-s1 kernel: Lustre: 104692:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:1s); client may timeout. req@ffff8b5e9e79d700 x1631535254866544/t0(0) o101->bb17aca1-57d8-f36a-a79b-bcdcd36ec002@10.8.18.20@o2ib6:1/0 lens 576/536 e 0 to 0 dl 1556299231 ref 1 fl Complete:/0/0 rc 0/0 Apr 26 10:21:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d2a4a3c8-307a-b27b-571e-a59089481ceb (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b72903c4c00, cur 1556299279 expire 1556299129 last 1556299052 Apr 26 10:27:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to e43eddd3-9787-62ef-1630-f21c74d3f047 (at 10.9.102.6@o2ib4) Apr 26 10:27:30 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Apr 26 10:32:59 fir-md1-s1 kernel: Lustre: 105269:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Apr 26 11:02:46 fir-md1-s1 kernel: Lustre: 104952:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8b3f1d6ae900 x1631558955587728/t0(0) o101->9b7917ef-4055-daa1-69c4-53b2ed51bc97@10.8.7.27@o2ib6:21/0 lens 584/3264 e 1 to 0 dl 1556301771 ref 2 fl Interpret:/0/0 rc 0/0 Apr 26 11:02:46 fir-md1-s1 kernel: Lustre: 104952:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 21 previous similar messages Apr 26 11:02:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 9b7917ef-4055-daa1-69c4-53b2ed51bc97 (at 10.8.7.27@o2ib6) reconnecting Apr 26 11:02:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.8.7.26@o2ib6) Apr 26 11:02:52 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 26 11:02:52 fir-md1-s1 kernel: Lustre: Skipped 38 previous similar messages Apr 26 11:02:56 fir-md1-s1 kernel: Lustre: 105065:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b3ee2e36c00 x1631558730972832/t0(0) o101->044042bf-dd57-7ee7-fd56-cb18003c928b@10.8.7.32@o2ib6:1/0 lens 568/0 e 0 to 0 dl 1556301781 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Apr 26 11:02:56 fir-md1-s1 kernel: Lustre: 105065:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages Apr 26 11:03:00 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.27.5@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8b5ed4b58b40/0x378007f60ffc3989 lrc: 3/0,0 mode: PW/PW res: [0x20001a1b4:0xf6ab:0x0].0x0 bits 0x40/0x0 rrc: 54 type: IBT flags: 0x60200400000020 nid: 10.8.27.5@o2ib6 remote: 0x54ba55412a9d5366 expref: 154 pid: 105128 timeout: 301614 lvb_type: 0 Apr 26 11:03:00 fir-md1-s1 kernel: LustreError: 105018:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b7115663400 ns: mdt-fir-MDT0000_UUID lock: ffff8b48d97d7bc0/0x378007f60ffc3ddb lrc: 3/0,0 mode: --/PW res: [0x20001a1b4:0xf6ab:0x0].0x0 bits 0x40/0x0 rrc: 50 type: IBT flags: 0x54a01400000020 nid: 10.8.27.5@o2ib6 remote: 0x54ba55412a9d5374 expref: 51 pid: 105018 timeout: 0 lvb_type: 0 Apr 26 11:03:00 fir-md1-s1 kernel: LustreError: 105018:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 5 previous similar messages Apr 26 11:04:13 fir-md1-s1 kernel: Lustre: 105266:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b5ddd6be600 x1631558731463696/t0(0) o101->044042bf-dd57-7ee7-fd56-cb18003c928b@10.8.7.32@o2ib6:18/0 lens 584/3264 e 0 to 0 dl 1556301858 ref 2 fl Interpret:/0/0 rc 0/0 Apr 26 11:04:13 fir-md1-s1 kernel: Lustre: 105266:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 6 previous similar messages Apr 26 11:04:18 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.8.27.5@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8b52f37118c0/0x378007f610342301 lrc: 3/0,0 mode: PW/PW res: [0x20001a1b4:0xf6af:0x0].0x0 bits 0x40/0x0 rrc: 39 type: IBT flags: 0x60200400000020 nid: 10.8.27.5@o2ib6 remote: 0x54ba55412a9d9b96 expref: 89 pid: 105068 timeout: 301692 lvb_type: 0 Apr 26 11:04:18 fir-md1-s1 kernel: LustreError: 105259:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b600bbc4800 ns: mdt-fir-MDT0000_UUID lock: ffff8b3f7dadaf40/0x378007f610344a3e lrc: 3/0,0 mode: PR/PR res: [0x20001a1b4:0xf6af:0x0].0x0 bits 0x1b/0x0 rrc: 33 type: IBT flags: 0x50200000000000 nid: 10.8.27.5@o2ib6 remote: 0x54ba55412a9d9cd8 expref: 5 pid: 105259 timeout: 0 lvb_type: 0 Apr 26 11:04:18 fir-md1-s1 kernel: LustreError: 105259:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 2 previous similar messages Apr 26 11:04:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 07fb1649-64d8-ae83-4bea-7750f31fccba (at 10.8.27.5@o2ib6) Apr 26 11:04:18 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Apr 26 11:04:43 fir-md1-s1 kernel: Lustre: 105087:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b51baa39b00 x1631561508664928/t0(0) o101->d7568667-cb09-2472-4feb-3c81b8410c3e@10.8.27.5@o2ib6:18/0 lens 584/3264 e 0 to 0 dl 1556301888 ref 2 fl Interpret:/0/0 rc 0/0 Apr 26 11:04:43 fir-md1-s1 kernel: Lustre: 105087:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 15 previous similar messages Apr 26 11:04:48 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.8.7.32@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8b5279692640/0x378007f6104afd41 lrc: 3/0,0 mode: PW/PW res: [0x20001a1b4:0xf6af:0x0].0x0 bits 0x40/0x0 rrc: 52 type: IBT flags: 0x60200400000020 nid: 10.8.7.32@o2ib6 remote: 0x4f21b56e6d830c79 expref: 164 pid: 104356 timeout: 301722 lvb_type: 0 Apr 26 11:04:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client b4dc4310-abd3-57a8-960f-a27b33e667d3 (at 10.8.27.7@o2ib6) reconnecting Apr 26 11:04:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 894a0d12-22fc-64a6-2c9f-6ae0f86acfe8 (at 10.8.7.27@o2ib6) Apr 26 11:04:49 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 26 11:05:18 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.8.27.6@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8b52967fbcc0/0x378007f6104afd6b lrc: 3/0,0 mode: PW/PW res: [0x20001a1b4:0xf6af:0x0].0x0 bits 0x40/0x0 rrc: 52 type: IBT flags: 0x60200400000020 nid: 10.8.27.6@o2ib6 remote: 0x90984505c32b8f57 expref: 165 pid: 104720 timeout: 301752 lvb_type: 0 Apr 26 11:05:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 9b7917ef-4055-daa1-69c4-53b2ed51bc97 (at 10.8.7.27@o2ib6) reconnecting Apr 26 11:05:20 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Apr 26 11:05:45 fir-md1-s1 kernel: Lustre: 104973:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-7), not sending early reply req@ffff8b4cd78d5100 x1631641837399248/t0(0) o101->b4dc4310-abd3-57a8-960f-a27b33e667d3@10.8.27.7@o2ib6:20/0 lens 568/0 e 0 to 0 dl 1556301950 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Apr 26 11:05:45 fir-md1-s1 kernel: Lustre: 104973:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 21 previous similar messages Apr 26 11:05:48 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.8.27.7@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8b37e9b65100/0x378007f6104afd95 lrc: 3/0,0 mode: PW/PW res: [0x20001a1b4:0xf6af:0x0].0x0 bits 0x40/0x0 rrc: 52 type: IBT flags: 0x60200400000020 nid: 10.8.27.7@o2ib6 remote: 0x99e7546141f87a1e expref: 169 pid: 105252 timeout: 301782 lvb_type: 0 Apr 26 11:05:48 fir-md1-s1 kernel: LustreError: 104994:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b7281365c00 ns: mdt-fir-MDT0000_UUID lock: ffff8b501a3118c0/0x378007f6104afdbf lrc: 3/0,0 mode: --/PW res: [0x20001a1b4:0xf6af:0x0].0x0 bits 0x40/0x0 rrc: 50 type: IBT flags: 0x54a01400000020 nid: 10.8.27.7@o2ib6 remote: 0x99e7546141f87a25 expref: 75 pid: 104994 timeout: 0 lvb_type: 0 Apr 26 11:05:48 fir-md1-s1 kernel: LustreError: 104994:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 1 previous similar message Apr 26 11:05:48 fir-md1-s1 kernel: LustreError: 105096:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556301858, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8b386aab0240/0x378007f6104afe9f lrc: 3/0,1 mode: --/PW res: [0x20001a1b4:0xf6af:0x0].0x0 bits 0x40/0x0 rrc: 49 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 105096 timeout: 0 lvb_type: 0 Apr 26 11:05:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to d793d107-3f05-1ade-e621-db73ca6f335e (at 10.8.27.7@o2ib6) Apr 26 11:05:48 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Apr 26 11:05:48 fir-md1-s1 kernel: LustreError: 105096:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 56 previous similar messages Apr 26 11:05:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 9b7917ef-4055-daa1-69c4-53b2ed51bc97 (at 10.8.7.27@o2ib6) reconnecting Apr 26 11:05:51 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Apr 26 11:06:18 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.8.7.27@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8b4d1fb8b600/0x378007f6104afe7c lrc: 3/0,0 mode: PW/PW res: [0x20001a1b4:0xf6af:0x0].0x0 bits 0x40/0x0 rrc: 50 type: IBT flags: 0x60200400000020 nid: 10.8.7.27@o2ib6 remote: 0x27a3033c1aa0b4ff expref: 171 pid: 105018 timeout: 301812 lvb_type: 0 Apr 26 11:06:18 fir-md1-s1 kernel: LustreError: 105017:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556301888, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8b6068aa4a40/0x378007f6106355fd lrc: 3/1,0 mode: --/PR res: [0x20001a1b4:0xf6af:0x0].0x0 bits 0x20/0x0 rrc: 49 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 105017 timeout: 0 lvb_type: 0 Apr 26 11:06:18 fir-md1-s1 kernel: LustreError: 105109:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b72f7303c00 ns: mdt-fir-MDT0000_UUID lock: ffff8b49d0a89b00/0x378007f6104afed0 lrc: 3/0,0 mode: PW/PW res: [0x20001a1b4:0xf6af:0x0].0x0 bits 0x40/0x0 rrc: 47 type: IBT flags: 0x50200400000020 nid: 10.8.27.6@o2ib6 remote: 0x90984505c32b8f5e expref: 5 pid: 105109 timeout: 0 lvb_type: 0 Apr 26 11:06:18 fir-md1-s1 kernel: Lustre: 105109:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (61:59s); client may timeout. req@ffff8b61e3254800 x1631534572569888/t0(0) o101->147c0c80-0156-d078-a77e-b8af4511cc40@10.8.27.6@o2ib6:18/0 lens 480/536 e 0 to 0 dl 1556301919 ref 1 fl Complete:/0/0 rc -107/-107 Apr 26 11:06:18 fir-md1-s1 kernel: LustreError: 105017:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 2 previous similar messages Apr 26 11:06:48 fir-md1-s1 kernel: LustreError: 105252:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556301918, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8b4d22ccbf00/0x378007f6107d0152 lrc: 3/1,0 mode: --/PR res: [0x20001a1b4:0xf6af:0x0].0x0 bits 0x20/0x0 rrc: 46 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 105252 timeout: 0 lvb_type: 0 Apr 26 11:06:48 fir-md1-s1 kernel: LustreError: 105252:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Apr 26 11:06:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client b0b0a554-034b-3c73-26cd-3b042ee0a246 (at 10.8.7.26@o2ib6) reconnecting Apr 26 11:06:49 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Apr 26 11:07:08 fir-md1-s1 kernel: Lustre: 105124:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b417d3e0900 x1631558956108336/t0(0) o101->9b7917ef-4055-daa1-69c4-53b2ed51bc97@10.8.7.27@o2ib6:13/0 lens 584/3264 e 0 to 0 dl 1556302033 ref 2 fl Interpret:/0/0 rc 0/0 Apr 26 11:07:08 fir-md1-s1 kernel: Lustre: 105124:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 6 previous similar messages Apr 26 11:07:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 894a0d12-22fc-64a6-2c9f-6ae0f86acfe8 (at 10.8.7.27@o2ib6) Apr 26 11:07:14 fir-md1-s1 kernel: Lustre: Skipped 31 previous similar messages Apr 26 11:07:18 fir-md1-s1 kernel: LustreError: 104967:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556301948, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8b46e1f18480/0x378007f610a5418a lrc: 3/0,1 mode: --/PW res: [0x20001a1b4:0xf6af:0x0].0x0 bits 0x40/0x0 rrc: 46 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 104967 timeout: 0 lvb_type: 0 Apr 26 11:07:18 fir-md1-s1 kernel: LustreError: 104967:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 3 previous similar messages Apr 26 11:07:38 fir-md1-s1 kernel: LNet: Service thread pid 105068 was inactive for 200.46s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 26 11:07:38 fir-md1-s1 kernel: Pid: 105068, comm: mdt01_036 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 11:07:38 fir-md1-s1 kernel: Call Trace: Apr 26 11:07:38 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 11:07:38 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 11:07:38 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 11:07:38 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 11:07:38 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 26 11:07:38 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 26 11:07:38 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 26 11:07:38 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 26 11:07:39 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 26 11:07:39 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 26 11:07:39 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 26 11:07:39 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 11:07:39 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 11:07:39 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 11:07:39 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 26 11:07:39 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 11:07:39 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 26 11:07:39 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556302059.105068 Apr 26 11:07:40 fir-md1-s1 kernel: LNet: Service thread pid 104985 was inactive for 201.32s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 26 11:07:40 fir-md1-s1 kernel: Pid: 104985, comm: mdt01_024 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 11:07:40 fir-md1-s1 kernel: Call Trace: Apr 26 11:07:40 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 11:07:40 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 11:07:40 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 11:07:40 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 11:07:40 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 26 11:07:40 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 26 11:07:40 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 26 11:07:40 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 26 11:07:40 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 26 11:07:40 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 26 11:07:40 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 26 11:07:40 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 11:07:40 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 11:07:40 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 11:07:40 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 26 11:07:40 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 11:07:40 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 26 11:07:40 fir-md1-s1 kernel: Pid: 105121, comm: mdt01_043 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 11:07:40 fir-md1-s1 kernel: Call Trace: Apr 26 11:07:40 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 11:07:40 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 11:07:40 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 11:07:40 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 11:07:40 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 26 11:07:40 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 26 11:07:40 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 26 11:07:40 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 26 11:07:40 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 26 11:07:40 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 26 11:07:40 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 26 11:07:40 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 11:07:40 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 11:07:40 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 11:07:40 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 26 11:07:40 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 11:07:40 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 26 11:07:40 fir-md1-s1 kernel: Pid: 104356, comm: mdt01_003 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 11:07:40 fir-md1-s1 kernel: Call Trace: Apr 26 11:07:40 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 11:07:40 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 11:07:40 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 11:07:40 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 11:07:40 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 26 11:07:40 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 26 11:07:40 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 26 11:07:40 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 26 11:07:40 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 26 11:07:40 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 26 11:07:40 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 26 11:07:40 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 11:07:40 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 11:07:40 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 11:07:40 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 26 11:07:40 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 11:07:40 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 26 11:07:40 fir-md1-s1 kernel: Pid: 105126, comm: mdt01_045 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 11:07:40 fir-md1-s1 kernel: Call Trace: Apr 26 11:07:40 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 11:07:40 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 11:07:40 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 11:07:40 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 11:07:40 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 26 11:07:40 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 26 11:07:40 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 26 11:07:40 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 26 11:07:40 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 26 11:07:40 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 26 11:07:40 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 26 11:07:40 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 11:07:40 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 11:07:40 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 11:07:40 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 26 11:07:40 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 11:07:40 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 26 11:07:40 fir-md1-s1 kernel: LNet: Service thread pid 105407 was inactive for 202.53s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 26 11:07:40 fir-md1-s1 kernel: LNet: Skipped 3 previous similar messages Apr 26 11:08:08 fir-md1-s1 kernel: LNet: Service thread pid 105017 was inactive for 200.23s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 26 11:08:08 fir-md1-s1 kernel: LNet: Skipped 6 previous similar messages Apr 26 11:08:08 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556302088.105017 Apr 26 11:08:10 fir-md1-s1 kernel: LNet: Service thread pid 105421 was inactive for 200.60s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 26 11:08:10 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556302090.105421 Apr 26 11:08:13 fir-md1-s1 kernel: LustreError: 105264:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556302003, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8b405f3ecec0/0x378007f610dbfb93 lrc: 3/1,0 mode: --/PR res: [0x20001a1b4:0xf6af:0x0].0x0 bits 0x13/0x8 rrc: 46 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 105264 timeout: 0 lvb_type: 0 Apr 26 11:08:13 fir-md1-s1 kernel: LustreError: 105264:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 2 previous similar messages Apr 26 11:08:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 9b7917ef-4055-daa1-69c4-53b2ed51bc97 (at 10.8.7.27@o2ib6) reconnecting Apr 26 11:08:16 fir-md1-s1 kernel: Lustre: Skipped 34 previous similar messages Apr 26 11:08:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 3ffe46ec-c7a3-b9d8-f46a-3b5970e85cf0 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b491c55f800, cur 1556302114 expire 1556301964 last 1556301887 Apr 26 11:08:34 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 26 11:08:38 fir-md1-s1 kernel: LNet: Service thread pid 105252 was inactive for 200.44s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 26 11:08:38 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556302118.105252 Apr 26 11:08:41 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556302121.105308 Apr 26 11:09:09 fir-md1-s1 kernel: LNet: Service thread pid 105018 was inactive for 200.59s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 26 11:09:09 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Apr 26 11:09:09 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556302149.105018 Apr 26 11:09:38 fir-md1-s1 kernel: LNet: Service thread pid 105096 was inactive for 200.34s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 26 11:09:38 fir-md1-s1 kernel: LNet: Skipped 3 previous similar messages Apr 26 11:09:38 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556302178.105096 Apr 26 11:09:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 894a0d12-22fc-64a6-2c9f-6ae0f86acfe8 (at 10.8.7.27@o2ib6) Apr 26 11:09:49 fir-md1-s1 kernel: Lustre: Skipped 62 previous similar messages Apr 26 11:10:04 fir-md1-s1 kernel: LNet: Service thread pid 105264 was inactive for 200.58s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 26 11:10:04 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556302204.105264 Apr 26 11:10:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 9b7917ef-4055-daa1-69c4-53b2ed51bc97 (at 10.8.7.27@o2ib6) reconnecting Apr 26 11:10:52 fir-md1-s1 kernel: Lustre: Skipped 59 previous similar messages Apr 26 11:15:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 894a0d12-22fc-64a6-2c9f-6ae0f86acfe8 (at 10.8.7.27@o2ib6) Apr 26 11:15:00 fir-md1-s1 kernel: Lustre: Skipped 119 previous similar messages Apr 26 11:16:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 9b7917ef-4055-daa1-69c4-53b2ed51bc97 (at 10.8.7.27@o2ib6) reconnecting Apr 26 11:16:02 fir-md1-s1 kernel: Lustre: Skipped 119 previous similar messages Apr 26 11:22:53 fir-md1-s1 kernel: Lustre: 104332:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b3cce441500 x1631558592766368/t0(0) o101->1135836c-5fb6-92af-ade3-8ef6cf526018@10.8.27.9@o2ib6:28/0 lens 584/3264 e 0 to 0 dl 1556302978 ref 2 fl Interpret:/0/0 rc 0/0 Apr 26 11:22:57 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.27.9@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8b49432dec00/0x378007f6140b86b4 lrc: 3/0,0 mode: PW/PW res: [0x20001a1b4:0xf71b:0x0].0x0 bits 0x40/0x0 rrc: 18 type: IBT flags: 0x60200400000020 nid: 10.8.27.9@o2ib6 remote: 0xc1cb05cd56118640 expref: 692 pid: 105109 timeout: 302811 lvb_type: 0 Apr 26 11:22:57 fir-md1-s1 kernel: LustreError: 105124:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b72fdf58000 ns: mdt-fir-MDT0000_UUID lock: ffff8b506cfb0fc0/0x378007f6140b86f3 lrc: 3/0,0 mode: PW/PW res: [0x20001a1b4:0xf71b:0x0].0x0 bits 0x40/0x0 rrc: 16 type: IBT flags: 0x50200400000020 nid: 10.8.27.9@o2ib6 remote: 0xc1cb05cd56118655 expref: 681 pid: 105124 timeout: 0 lvb_type: 0 Apr 26 11:22:57 fir-md1-s1 kernel: Lustre: 105104:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (92:1027s); client may timeout. req@ffff8b62332caa00 x1631641837393632/t0(0) o101->b4dc4310-abd3-57a8-960f-a27b33e667d3@10.8.27.7@o2ib6:18/0 lens 480/536 e 0 to 0 dl 1556301950 ref 1 fl Complete:/0/0 rc -107/-107 Apr 26 11:22:57 fir-md1-s1 kernel: LNet: Service thread pid 105104 completed after 1118.99s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 26 11:22:57 fir-md1-s1 kernel: LNet: Skipped 20 previous similar messages Apr 26 11:25:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 894a0d12-22fc-64a6-2c9f-6ae0f86acfe8 (at 10.8.7.27@o2ib6) Apr 26 11:25:20 fir-md1-s1 kernel: Lustre: Skipped 240 previous similar messages Apr 26 11:26:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 9b7917ef-4055-daa1-69c4-53b2ed51bc97 (at 10.8.7.27@o2ib6) reconnecting Apr 26 11:26:22 fir-md1-s1 kernel: Lustre: Skipped 239 previous similar messages Apr 26 11:30:56 fir-md1-s1 kernel: Lustre: 105266:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8b60796d2100 x1631709816909168/t0(0) o101->79273348-f676-2905-42da-85a87e1ba2d5@10.9.107.14@o2ib4:1/0 lens 568/0 e 1 to 0 dl 1556303461 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Apr 26 11:30:56 fir-md1-s1 kernel: Lustre: 105266:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 7 previous similar messages Apr 26 11:31:10 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.9.104.21@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8b50f2758480/0x378007f615e70f98 lrc: 3/0,0 mode: PW/PW res: [0x20001a225:0x8716:0x0].0x0 bits 0x40/0x0 rrc: 56 type: IBT flags: 0x60200400000020 nid: 10.9.104.21@o2ib4 remote: 0xba59162a29015b79 expref: 397 pid: 105005 timeout: 303304 lvb_type: 0 Apr 26 11:31:10 fir-md1-s1 kernel: LustreError: 105293:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b7339b38c00 ns: mdt-fir-MDT0000_UUID lock: ffff8b4e59fae540/0x378007f615e71024 lrc: 3/0,0 mode: PW/PW res: [0x20001a225:0x8716:0x0].0x0 bits 0x40/0x0 rrc: 54 type: IBT flags: 0x50200400000020 nid: 10.9.104.21@o2ib4 remote: 0xba59162a29015b87 expref: 319 pid: 105293 timeout: 0 lvb_type: 0 Apr 26 11:31:10 fir-md1-s1 kernel: LustreError: 105293:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 4 previous similar messages Apr 26 11:32:34 fir-md1-s1 kernel: Lustre: 105005:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b4bb1b15a00 x1631546095292896/t0(0) o101->1f7bbeda-f291-d2ba-e680-a24cad2ce97f@10.9.104.23@o2ib4:9/0 lens 584/3264 e 0 to 0 dl 1556303559 ref 2 fl Interpret:/0/0 rc 0/0 Apr 26 11:32:34 fir-md1-s1 kernel: Lustre: 105005:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 11 previous similar messages Apr 26 11:32:39 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.9.104.19@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8b4be9783a80/0x378007f61651fc07 lrc: 3/0,0 mode: PW/PW res: [0x20001a221:0x247:0x0].0x0 bits 0x40/0x0 rrc: 35 type: IBT flags: 0x60200400000020 nid: 10.9.104.19@o2ib4 remote: 0x2ed561758a8fb7a2 expref: 434 pid: 104973 timeout: 303393 lvb_type: 0 Apr 26 11:32:39 fir-md1-s1 kernel: LustreError: 105128:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b730ac7dc00 ns: mdt-fir-MDT0000_UUID lock: ffff8b600264c140/0x378007f61651fcbd lrc: 3/0,0 mode: PR/PR res: [0x20001a221:0x247:0x0].0x0 bits 0x20/0x0 rrc: 31 type: IBT flags: 0x50200000000000 nid: 10.9.104.19@o2ib4 remote: 0x2ed561758a8fb7b0 expref: 5 pid: 105128 timeout: 0 lvb_type: 0 Apr 26 11:32:39 fir-md1-s1 kernel: LustreError: 105128:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 1 previous similar message Apr 26 11:33:09 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.9.104.17@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8b4c6fa17080/0x378007f61671f2e4 lrc: 3/0,0 mode: PW/PW res: [0x20001a221:0x247:0x0].0x0 bits 0x40/0x0 rrc: 39 type: IBT flags: 0x60200400000020 nid: 10.9.104.17@o2ib4 remote: 0x948905ad7bc40de0 expref: 1293 pid: 105419 timeout: 303423 lvb_type: 0 Apr 26 11:33:09 fir-md1-s1 kernel: LustreError: 105076:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b72903c4400 ns: mdt-fir-MDT0000_UUID lock: ffff8b524dec4c80/0x378007f61671f8b0 lrc: 3/0,0 mode: PR/PR res: [0x20001a221:0x247:0x0].0x0 bits 0x20/0x0 rrc: 35 type: IBT flags: 0x50200000000000 nid: 10.9.104.17@o2ib4 remote: 0x948905ad7bc40dee expref: 630 pid: 105076 timeout: 0 lvb_type: 0 Apr 26 11:33:09 fir-md1-s1 kernel: LustreError: 105076:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 1 previous similar message Apr 26 11:33:39 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.9.104.23@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8b36af6269c0/0x378007f6168ba35d lrc: 3/0,0 mode: PW/PW res: [0x20001a221:0x247:0x0].0x0 bits 0x40/0x0 rrc: 42 type: IBT flags: 0x60200400000020 nid: 10.9.104.23@o2ib4 remote: 0xc74f0df9eb2c58c1 expref: 450 pid: 104973 timeout: 303453 lvb_type: 0 Apr 26 11:33:39 fir-md1-s1 kernel: LustreError: 105076:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b73056a8400 ns: mdt-fir-MDT0000_UUID lock: ffff8b4e03ec5580/0x378007f6168ba413 lrc: 3/0,0 mode: --/PW res: [0x20001a221:0x247:0x0].0x0 bits 0x40/0x0 rrc: 40 type: IBT flags: 0x54a01400000020 nid: 10.9.104.23@o2ib4 remote: 0xc74f0df9eb2c58c8 expref: 351 pid: 105076 timeout: 0 lvb_type: 0 Apr 26 11:33:39 fir-md1-s1 kernel: LustreError: 105076:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 1 previous similar message Apr 26 11:35:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 894a0d12-22fc-64a6-2c9f-6ae0f86acfe8 (at 10.8.7.27@o2ib6) Apr 26 11:35:40 fir-md1-s1 kernel: Lustre: Skipped 246 previous similar messages Apr 26 11:36:12 fir-md1-s1 kernel: Lustre: 105376:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b72d3fe2700 x1631585805340032/t0(0) o101->8a37f7b1-3efc-30e9-f8d1-739df6680357@10.9.104.19@o2ib4:17/0 lens 584/3264 e 0 to 0 dl 1556303777 ref 2 fl Interpret:/0/0 rc 0/0 Apr 26 11:36:12 fir-md1-s1 kernel: Lustre: 105376:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 38 previous similar messages Apr 26 11:36:17 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.9.104.23@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8b51fab82ac0/0x378007f617381cd2 lrc: 3/0,0 mode: PW/PW res: [0x20001a221:0x263:0x0].0x0 bits 0x40/0x0 rrc: 34 type: IBT flags: 0x60200400000020 nid: 10.9.104.23@o2ib4 remote: 0xc74f0df9eb2de686 expref: 186 pid: 105262 timeout: 303611 lvb_type: 0 Apr 26 11:36:17 fir-md1-s1 kernel: LustreError: 105109:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b72393d8000 ns: mdt-fir-MDT0000_UUID lock: ffff8b5e822c7980/0x378007f617381e99 lrc: 3/0,0 mode: PR/PR res: [0x20001a221:0x263:0x0].0x0 bits 0x20/0x0 rrc: 27 type: IBT flags: 0x50200000000000 nid: 10.9.104.23@o2ib4 remote: 0xc74f0df9eb2de68d expref: 5 pid: 105109 timeout: 0 lvb_type: 0 Apr 26 11:36:17 fir-md1-s1 kernel: LustreError: 105109:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 2 previous similar messages Apr 26 11:36:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 9b7917ef-4055-daa1-69c4-53b2ed51bc97 (at 10.8.7.27@o2ib6) reconnecting Apr 26 11:36:42 fir-md1-s1 kernel: Lustre: Skipped 242 previous similar messages Apr 26 11:36:47 fir-md1-s1 kernel: LustreError: 104335:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b72943b1800 ns: mdt-fir-MDT0000_UUID lock: ffff8b5ecf368000/0x378007f6175259d0 lrc: 3/0,0 mode: PR/PR res: [0x20001a221:0x263:0x0].0x0 bits 0x1b/0x0 rrc: 34 type: IBT flags: 0x50200400000020 nid: 10.9.104.24@o2ib4 remote: 0xc0a0d43c57d4ad63 expref: 5 pid: 104335 timeout: 0 lvb_type: 0 Apr 26 11:36:47 fir-md1-s1 kernel: LustreError: 104335:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 1 previous similar message Apr 26 11:37:47 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.9.104.22@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff8b4b8fe24140/0x378007f61784c9b2 lrc: 3/0,0 mode: PW/PW res: [0x20001a221:0x263:0x0].0x0 bits 0x40/0x0 rrc: 33 type: IBT flags: 0x60200400000020 nid: 10.9.104.22@o2ib4 remote: 0x25da897db280c386 expref: 539 pid: 105293 timeout: 303701 lvb_type: 0 Apr 26 11:37:47 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 2 previous similar messages Apr 26 11:38:17 fir-md1-s1 kernel: LustreError: 105076:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b72dca68400 ns: mdt-fir-MDT0000_UUID lock: ffff8b490708b3c0/0x378007f61784cad1 lrc: 3/0,0 mode: PW/PW res: [0x20001a221:0x263:0x0].0x0 bits 0x40/0x0 rrc: 29 type: IBT flags: 0x50200400000020 nid: 10.9.104.22@o2ib4 remote: 0x25da897db280c38d expref: 4 pid: 105076 timeout: 0 lvb_type: 0 Apr 26 11:38:17 fir-md1-s1 kernel: LustreError: 105076:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 4 previous similar messages Apr 26 11:38:17 fir-md1-s1 kernel: Lustre: 105076:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:30s); client may timeout. req@ffff8b4ef3ab3f00 x1631558582920976/t0(0) o101->c1d9f0f7-d490-e556-ed11-756e6b122018@10.9.104.22@o2ib4:17/0 lens 480/536 e 0 to 0 dl 1556303867 ref 1 fl Complete:/0/0 rc -107/-107 Apr 26 11:38:42 fir-md1-s1 kernel: Lustre: 105124:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b3c8d3a7500 x1631546514875872/t0(0) o101->0f52cb94-f0df-9fca-8e07-7d7771e802ff@10.9.104.21@o2ib4:17/0 lens 584/3264 e 0 to 0 dl 1556303927 ref 2 fl Interpret:/0/0 rc 0/0 Apr 26 11:38:42 fir-md1-s1 kernel: Lustre: 105124:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 51 previous similar messages Apr 26 11:38:47 fir-md1-s1 kernel: LustreError: 104333:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556303837, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8b5e4b7f2d00/0x378007f61784d5ba lrc: 3/0,1 mode: --/PW res: [0x20001a221:0x263:0x0].0x0 bits 0x40/0x0 rrc: 27 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 104333 timeout: 0 lvb_type: 0 Apr 26 11:38:47 fir-md1-s1 kernel: LustreError: 104333:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 3 previous similar messages Apr 26 11:39:17 fir-md1-s1 kernel: LustreError: 105087:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556303867, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8b49514d18c0/0x378007f617a55dc2 lrc: 3/1,0 mode: --/PR res: [0x20001a221:0x263:0x0].0x0 bits 0x20/0x0 rrc: 27 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 105087 timeout: 0 lvb_type: 0 Apr 26 11:39:17 fir-md1-s1 kernel: LustreError: 105087:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 2 previous similar messages Apr 26 11:39:47 fir-md1-s1 kernel: LustreError: 104965:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556303897, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8b52fd21e0c0/0x378007f617be38a6 lrc: 3/1,0 mode: --/PR res: [0x20001a221:0x263:0x0].0x0 bits 0x13/0x8 rrc: 27 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 104965 timeout: 0 lvb_type: 0 Apr 26 11:39:47 fir-md1-s1 kernel: LustreError: 104965:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Apr 26 11:40:37 fir-md1-s1 kernel: LNet: Service thread pid 105293 was inactive for 200.39s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 26 11:40:37 fir-md1-s1 kernel: LNet: Skipped 3 previous similar messages Apr 26 11:40:37 fir-md1-s1 kernel: Pid: 105293, comm: mdt01_054 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 11:40:37 fir-md1-s1 kernel: Call Trace: Apr 26 11:40:37 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 11:40:37 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 11:40:37 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 11:40:37 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 11:40:37 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 26 11:40:37 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 26 11:40:37 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 26 11:40:37 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 26 11:40:37 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 26 11:40:37 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 11:40:37 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 11:40:37 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 11:40:37 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 26 11:40:37 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 11:40:37 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 26 11:40:37 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556304037.105293 Apr 26 11:40:39 fir-md1-s1 kernel: LNet: Service thread pid 104973 was inactive for 201.26s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 26 11:40:39 fir-md1-s1 kernel: Pid: 104973, comm: mdt01_022 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 11:40:39 fir-md1-s1 kernel: Call Trace: Apr 26 11:40:39 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 11:40:39 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 11:40:39 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 11:40:39 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 11:40:39 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 26 11:40:39 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 26 11:40:39 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 26 11:40:39 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 26 11:40:39 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 26 11:40:39 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 26 11:40:39 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 26 11:40:39 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 11:40:39 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 11:40:39 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 11:40:39 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 26 11:40:39 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 11:40:39 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 26 11:40:39 fir-md1-s1 kernel: Pid: 104333, comm: mdt02_000 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 11:40:39 fir-md1-s1 kernel: Call Trace: Apr 26 11:40:39 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 11:40:39 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 11:40:39 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 11:40:39 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 11:40:39 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 26 11:40:39 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 26 11:40:39 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 26 11:40:39 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 26 11:40:39 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 26 11:40:39 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 26 11:40:39 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 26 11:40:39 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 11:40:39 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 11:40:39 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 11:40:39 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 26 11:40:39 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 11:40:39 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 26 11:40:39 fir-md1-s1 kernel: Pid: 105237, comm: mdt02_032 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 11:40:39 fir-md1-s1 kernel: Call Trace: Apr 26 11:40:39 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 11:40:39 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 11:40:39 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 11:40:39 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 11:40:39 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 26 11:40:39 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 26 11:40:39 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 26 11:40:39 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 26 11:40:39 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 26 11:40:39 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 11:40:39 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 11:40:39 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 11:40:39 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 26 11:40:39 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 11:40:39 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 26 11:40:39 fir-md1-s1 kernel: Pid: 105262, comm: mdt00_037 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 11:40:39 fir-md1-s1 kernel: Call Trace: Apr 26 11:40:39 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 11:40:39 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 11:40:39 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 11:40:39 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 11:40:39 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 26 11:40:39 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 26 11:40:39 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 26 11:40:39 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 26 11:40:39 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 26 11:40:39 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 26 11:40:39 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 26 11:40:39 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 11:40:39 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 11:40:39 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 11:40:39 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 26 11:40:39 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 11:40:39 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 26 11:40:39 fir-md1-s1 kernel: LNet: Service thread pid 105104 was inactive for 202.40s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 26 11:41:08 fir-md1-s1 kernel: LNet: Service thread pid 105087 was inactive for 200.61s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 26 11:41:08 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556304068.105087 Apr 26 11:41:14 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556304074.105109 Apr 26 11:41:37 fir-md1-s1 kernel: LNet: Service thread pid 104965 was inactive for 200.35s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 26 11:41:37 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Apr 26 11:41:37 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556304097.104965 Apr 26 11:46:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 894a0d12-22fc-64a6-2c9f-6ae0f86acfe8 (at 10.8.7.27@o2ib6) Apr 26 11:46:00 fir-md1-s1 kernel: Lustre: Skipped 354 previous similar messages Apr 26 11:47:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 9b7917ef-4055-daa1-69c4-53b2ed51bc97 (at 10.8.7.27@o2ib6) reconnecting Apr 26 11:47:02 fir-md1-s1 kernel: Lustre: Skipped 363 previous similar messages Apr 26 11:56:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 894a0d12-22fc-64a6-2c9f-6ae0f86acfe8 (at 10.8.7.27@o2ib6) Apr 26 11:56:20 fir-md1-s1 kernel: Lustre: Skipped 379 previous similar messages Apr 26 11:57:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 9b7917ef-4055-daa1-69c4-53b2ed51bc97 (at 10.8.7.27@o2ib6) reconnecting Apr 26 11:57:22 fir-md1-s1 kernel: Lustre: Skipped 379 previous similar messages Apr 26 12:06:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 894a0d12-22fc-64a6-2c9f-6ae0f86acfe8 (at 10.8.7.27@o2ib6) Apr 26 12:06:40 fir-md1-s1 kernel: Lustre: Skipped 379 previous similar messages Apr 26 12:07:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 9b7917ef-4055-daa1-69c4-53b2ed51bc97 (at 10.8.7.27@o2ib6) reconnecting Apr 26 12:07:42 fir-md1-s1 kernel: Lustre: Skipped 379 previous similar messages Apr 26 12:08:45 fir-md1-s1 kernel: LustreError: 105126:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b72f61ba000 ns: mdt-fir-MDT0000_UUID lock: ffff8b52967fca40/0x378007f6104afeec lrc: 3/0,0 mode: PW/PW res: [0x20001a1b4:0xf6af:0x0].0x0 bits 0x40/0x0 rrc: 41 type: IBT flags: 0x50200400000020 nid: 10.8.7.27@o2ib6 remote: 0x27a3033c1aa0b506 expref: 7 pid: 105126 timeout: 0 lvb_type: 0 Apr 26 12:08:45 fir-md1-s1 kernel: Lustre: 105126:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (123:3744s); client may timeout. req@ffff8b51baa3d100 x1631558956094000/t0(0) o101->9b7917ef-4055-daa1-69c4-53b2ed51bc97@10.8.7.27@o2ib6:18/0 lens 480/536 e 0 to 0 dl 1556301981 ref 1 fl Complete:/0/0 rc -107/-107 Apr 26 12:08:45 fir-md1-s1 kernel: LNet: Service thread pid 105126 completed after 3867.20s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 26 12:08:46 fir-md1-s1 kernel: LNet: Service thread pid 105121 completed after 3868.11s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 26 12:08:46 fir-md1-s1 kernel: Lustre: 105407:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (123:3745s); client may timeout. req@ffff8b414b6ee300 x1631558956094048/t236229750852(0) o36->9b7917ef-4055-daa1-69c4-53b2ed51bc97@10.8.7.27@o2ib6:18/0 lens 488/424 e 0 to 0 dl 1556301981 ref 1 fl Complete:/0/0 rc 0/0 Apr 26 12:08:46 fir-md1-s1 kernel: LNet: Skipped 19 previous similar messages Apr 26 12:09:11 fir-md1-s1 kernel: Lustre: 105018:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b4d2e73e000 x1631558956488688/t0(0) o101->9b7917ef-4055-daa1-69c4-53b2ed51bc97@10.8.7.27@o2ib6:16/0 lens 568/0 e 0 to 0 dl 1556305756 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Apr 26 12:10:16 fir-md1-s1 kernel: LustreError: 104355:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556305726, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8b38455a5340/0x378007f620a700a8 lrc: 3/0,1 mode: --/PW res: [0x20001a1b4:0xf6af:0x0].0x0 bits 0x40/0x0 rrc: 35 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 104355 timeout: 0 lvb_type: 0 Apr 26 12:10:16 fir-md1-s1 kernel: LustreError: 104355:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 9 previous similar messages Apr 26 12:11:16 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.7.27@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8b55787d60c0/0x378007f620a70085 lrc: 3/0,0 mode: PW/PW res: [0x20001a1b4:0xf6af:0x0].0x0 bits 0x40/0x0 rrc: 35 type: IBT flags: 0x60200400000020 nid: 10.8.7.27@o2ib6 remote: 0x27a3033c1aa0bd72 expref: 119 pid: 105128 timeout: 305710 lvb_type: 0 Apr 26 12:11:16 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Apr 26 12:11:16 fir-md1-s1 kernel: LustreError: 105000:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b4d25a47c00 ns: mdt-fir-MDT0000_UUID lock: ffff8b5203f20000/0x378007f620a71649 lrc: 3/0,0 mode: PR/PR res: [0x20001a1b4:0xf6af:0x0].0x0 bits 0x20/0x0 rrc: 31 type: IBT flags: 0x50200000000000 nid: 10.8.7.27@o2ib6 remote: 0x27a3033c1aa0bd79 expref: 3 pid: 105000 timeout: 0 lvb_type: 0 Apr 26 12:11:16 fir-md1-s1 kernel: LustreError: 105000:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 5 previous similar messages Apr 26 12:11:41 fir-md1-s1 kernel: Lustre: 104994:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b5627f6b300 x1631558957075280/t0(0) o101->9b7917ef-4055-daa1-69c4-53b2ed51bc97@10.8.7.27@o2ib6:16/0 lens 584/3264 e 0 to 0 dl 1556305906 ref 2 fl Interpret:/0/0 rc 0/0 Apr 26 12:11:41 fir-md1-s1 kernel: Lustre: 104994:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 12 previous similar messages Apr 26 12:12:46 fir-md1-s1 kernel: LustreError: 105126:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556305876, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8b36b4adc380/0x378007f62190cbdc lrc: 3/0,1 mode: --/PW res: [0x20001a1b4:0xf6af:0x0].0x0 bits 0x40/0x0 rrc: 36 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 105126 timeout: 0 lvb_type: 0 Apr 26 12:12:46 fir-md1-s1 kernel: LustreError: 105126:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 12 previous similar messages Apr 26 12:13:46 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.27.6@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8b4e1f3e72c0/0x378007f62190cbd5 lrc: 3/0,0 mode: PW/PW res: [0x20001a1b4:0xf6af:0x0].0x0 bits 0x40/0x0 rrc: 36 type: IBT flags: 0x60200400000020 nid: 10.8.27.6@o2ib6 remote: 0x90984505c32bb6a9 expref: 208 pid: 105068 timeout: 305860 lvb_type: 0 Apr 26 12:13:46 fir-md1-s1 kernel: LustreError: 105068:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b627ca5c400 ns: mdt-fir-MDT0000_UUID lock: ffff8b4e1f3e5e80/0x378007f62190d20a lrc: 3/0,0 mode: PR/PR res: [0x20001a1b4:0xf6af:0x0].0x0 bits 0x20/0x0 rrc: 33 type: IBT flags: 0x50200000000000 nid: 10.8.27.6@o2ib6 remote: 0x90984505c32bb6b0 expref: 3 pid: 105068 timeout: 0 lvb_type: 0 Apr 26 12:14:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client c41887d8-667a-fcc3-3801-53e405eea2a0 (at 10.8.30.34@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b72dca68800, cur 1556306050 expire 1556305900 last 1556305823 Apr 26 12:14:10 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Apr 26 12:14:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client c41887d8-667a-fcc3-3801-53e405eea2a0 (at 10.8.30.34@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b7271b1dc00, cur 1556306067 expire 1556305917 last 1556305840 Apr 26 12:14:27 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Apr 26 12:14:51 fir-md1-s1 kernel: Lustre: 105034:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b42e7e04e00 x1631609731336224/t0(0) o101->1c578c74-5128-6e3f-cdf7-83221a90bc4e@10.8.27.8@o2ib6:26/0 lens 584/3264 e 0 to 0 dl 1556306096 ref 2 fl Interpret:/0/0 rc 0/0 Apr 26 12:14:51 fir-md1-s1 kernel: Lustre: 105034:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 12 previous similar messages Apr 26 12:14:55 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.27.8@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8b430f4a3a80/0x378007f622756303 lrc: 3/0,0 mode: PW/PW res: [0x20001a1b4:0xf6b7:0x0].0x0 bits 0x40/0x0 rrc: 43 type: IBT flags: 0x60200400000020 nid: 10.8.27.8@o2ib6 remote: 0xd7b730106acfe7c9 expref: 122 pid: 104724 timeout: 305929 lvb_type: 0 Apr 26 12:14:55 fir-md1-s1 kernel: LustreError: 104932:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b7135a64800 ns: mdt-fir-MDT0000_UUID lock: ffff8b4eff2d9d40/0x378007f6227565bf lrc: 3/0,0 mode: PW/PW res: [0x20001a1b4:0xf6b7:0x0].0x0 bits 0x40/0x0 rrc: 39 type: IBT flags: 0x50200400000020 nid: 10.8.27.8@o2ib6 remote: 0xd7b730106acfe7d7 expref: 7 pid: 104932 timeout: 0 lvb_type: 0 Apr 26 12:16:25 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.8.27.5@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8b491b782ac0/0x378007f6229a1cea lrc: 3/0,0 mode: PW/PW res: [0x20001a1b4:0xf6b7:0x0].0x0 bits 0x40/0x0 rrc: 66 type: IBT flags: 0x60200400000020 nid: 10.8.27.5@o2ib6 remote: 0x54ba55412a9ef6da expref: 234 pid: 104939 timeout: 306019 lvb_type: 0 Apr 26 12:16:25 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 2 previous similar messages Apr 26 12:16:25 fir-md1-s1 kernel: Lustre: 105233:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (61:29s); client may timeout. req@ffff8b5d0726b000 x1631641838497408/t0(0) o55->b4dc4310-abd3-57a8-960f-a27b33e667d3@10.8.27.7@o2ib6:25/0 lens 472/192 e 0 to 0 dl 1556306156 ref 1 fl Complete:/0/0 rc -22/-22 Apr 26 12:16:25 fir-md1-s1 kernel: Lustre: 105233:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 7 previous similar messages Apr 26 12:16:25 fir-md1-s1 kernel: LustreError: 105000:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556306095, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8b45e27d7740/0x378007f6229a1d4c lrc: 3/0,1 mode: --/PW res: [0x20001a1b4:0xf6b7:0x0].0x0 bits 0x40/0x0 rrc: 62 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 105000 timeout: 0 lvb_type: 0 Apr 26 12:16:25 fir-md1-s1 kernel: LustreError: 105000:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 19 previous similar messages Apr 26 12:16:55 fir-md1-s1 kernel: LustreError: 104330:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b4f98ede000 ns: mdt-fir-MDT0000_UUID lock: ffff8b5ff8a49440/0x378007f6229a1dc3 lrc: 3/0,0 mode: PR/PR res: [0x20001a1b4:0xf6b7:0x0].0x0 bits 0x20/0x0 rrc: 60 type: IBT flags: 0x50200400000020 nid: 10.8.27.7@o2ib6 remote: 0x99e7546141f9971f expref: 5 pid: 104330 timeout: 0 lvb_type: 0 Apr 26 12:16:55 fir-md1-s1 kernel: LustreError: 105286:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556306125, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8b4d5b7d0480/0x378007f622c0a086 lrc: 3/0,1 mode: --/PW res: [0x20001a1b4:0xf6b7:0x0].0x0 bits 0x40/0x0 rrc: 65 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 105286 timeout: 0 lvb_type: 0 Apr 26 12:16:55 fir-md1-s1 kernel: LustreError: 105286:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 3 previous similar messages Apr 26 12:16:55 fir-md1-s1 kernel: LustreError: 104330:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 2 previous similar messages Apr 26 12:16:55 fir-md1-s1 kernel: Lustre: 104330:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (61:59s); client may timeout. req@ffff8b5295ee7b00 x1631641838497536/t0(0) o101->b4dc4310-abd3-57a8-960f-a27b33e667d3@10.8.27.7@o2ib6:25/0 lens 568/1672 e 0 to 0 dl 1556306156 ref 1 fl Complete:/0/0 rc -107/-107 Apr 26 12:16:55 fir-md1-s1 kernel: Lustre: 104330:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Apr 26 12:16:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 894a0d12-22fc-64a6-2c9f-6ae0f86acfe8 (at 10.8.7.27@o2ib6) Apr 26 12:16:58 fir-md1-s1 kernel: Lustre: Skipped 325 previous similar messages Apr 26 12:17:20 fir-md1-s1 kernel: Lustre: 104932:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b52879c7200 x1631558563477008/t0(0) o101->5af85e95-71ec-5689-9879-f126f8845b44@10.8.27.1@o2ib6:25/0 lens 576/3264 e 0 to 0 dl 1556306245 ref 2 fl Interpret:/0/0 rc 0/0 Apr 26 12:17:20 fir-md1-s1 kernel: Lustre: 104932:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 57 previous similar messages Apr 26 12:17:25 fir-md1-s1 kernel: Lustre: 104331:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:120s); client may timeout. req@ffff8b4c8fe60300 x1631534609682544/t0(0) o101->cec4ce3d-7421-61e4-362c-c29b7d79240a@10.8.27.10@o2ib6:25/0 lens 480/536 e 0 to 0 dl 1556306125 ref 1 fl Complete:/0/0 rc -107/-107 Apr 26 12:17:25 fir-md1-s1 kernel: Lustre: 104331:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 8 previous similar messages Apr 26 12:17:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client c9131811-e205-b209-c297-3fd8b5a2cc13 (at 10.8.27.4@o2ib6) reconnecting Apr 26 12:17:56 fir-md1-s1 kernel: Lustre: Skipped 302 previous similar messages Apr 26 12:18:55 fir-md1-s1 kernel: LustreError: 105286:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556306245, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8b46d12fad00/0x378007f6233e2a5f lrc: 3/1,0 mode: --/PR res: [0x20001a1b4:0xf6b7:0x0].0x0 bits 0x13/0x8 rrc: 52 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 105286 timeout: 0 lvb_type: 0 Apr 26 12:18:55 fir-md1-s1 kernel: LustreError: 104724:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556306245, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8b4931b52ac0/0x378007f6233e2a4a lrc: 3/0,1 mode: --/PW res: [0x20001a1b4:0xf6b7:0x0].0x0 bits 0x40/0x0 rrc: 53 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 104724 timeout: 0 lvb_type: 0 Apr 26 12:18:55 fir-md1-s1 kernel: LustreError: 104724:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 6 previous similar messages Apr 26 12:18:55 fir-md1-s1 kernel: LustreError: 105286:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 16 previous similar messages Apr 26 12:19:55 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.27.4@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8b53114406c0/0x378007f6233e29d3 lrc: 3/0,0 mode: PW/PW res: [0x20001a1b4:0xf6b7:0x0].0x0 bits 0x40/0x0 rrc: 52 type: IBT flags: 0x60200400000020 nid: 10.8.27.4@o2ib6 remote: 0x7abaf3db7f8f3097 expref: 81 pid: 105419 timeout: 306229 lvb_type: 0 Apr 26 12:19:55 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 2 previous similar messages Apr 26 12:19:55 fir-md1-s1 kernel: LustreError: 105126:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b4931b76400 ns: mdt-fir-MDT0000_UUID lock: ffff8b4959d8b600/0x378007f6233e2b15 lrc: 1/0,0 mode: --/PR res: [0x20001a1b4:0xf6b7:0x0].0x0 bits 0x1b/0x0 rrc: 48 type: IBT flags: 0x54a01400000020 nid: 10.8.27.4@o2ib6 remote: 0x7abaf3db7f8f309e expref: 17 pid: 105126 timeout: 0 lvb_type: 0 Apr 26 12:19:55 fir-md1-s1 kernel: LustreError: 105126:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 14 previous similar messages Apr 26 12:20:45 fir-md1-s1 kernel: LNet: Service thread pid 104948 was inactive for 200.20s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 26 12:20:45 fir-md1-s1 kernel: LNet: Skipped 3 previous similar messages Apr 26 12:20:45 fir-md1-s1 kernel: Pid: 104948, comm: mdt02_010 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 12:20:45 fir-md1-s1 kernel: Call Trace: Apr 26 12:20:45 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 12:20:45 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 12:20:45 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 12:20:45 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 12:20:45 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 26 12:20:45 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 26 12:20:45 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 26 12:20:45 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 26 12:20:45 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 26 12:20:45 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 12:20:45 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 12:20:45 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 12:20:45 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 26 12:20:45 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 12:20:45 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 26 12:20:45 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556306445.104948 Apr 26 12:20:47 fir-md1-s1 kernel: LNet: Service thread pid 105038 was inactive for 201.15s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 26 12:20:47 fir-md1-s1 kernel: Pid: 105038, comm: mdt00_021 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 12:20:47 fir-md1-s1 kernel: Call Trace: Apr 26 12:20:47 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 12:20:47 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 12:20:47 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 12:20:47 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 12:20:47 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 26 12:20:47 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 26 12:20:47 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 26 12:20:47 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 26 12:20:47 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 26 12:20:47 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 12:20:47 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 12:20:47 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 12:20:47 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 26 12:20:47 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 12:20:47 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 26 12:20:47 fir-md1-s1 kernel: Pid: 104966, comm: mdt02_012 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 12:20:47 fir-md1-s1 kernel: Call Trace: Apr 26 12:20:47 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 12:20:47 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 12:20:47 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 12:20:47 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 12:20:47 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 26 12:20:47 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 26 12:20:47 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 26 12:20:47 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 26 12:20:47 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 26 12:20:47 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 26 12:20:47 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 26 12:20:47 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 12:20:47 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 12:20:47 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 12:20:47 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 26 12:20:47 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 12:20:47 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 26 12:20:47 fir-md1-s1 kernel: Pid: 105259, comm: mdt00_035 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 12:20:47 fir-md1-s1 kernel: Call Trace: Apr 26 12:20:47 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 12:20:47 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 12:20:47 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 12:20:47 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 12:20:47 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 26 12:20:47 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 26 12:20:47 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 26 12:20:47 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 26 12:20:47 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 26 12:20:47 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 26 12:20:47 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 26 12:20:47 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 12:20:47 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 12:20:47 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 12:20:47 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 26 12:20:47 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 12:20:47 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 26 12:20:47 fir-md1-s1 kernel: Pid: 104331, comm: mdt01_001 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 12:20:47 fir-md1-s1 kernel: Call Trace: Apr 26 12:20:47 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 12:20:47 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 12:20:47 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 12:20:47 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 12:20:47 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 26 12:20:47 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 26 12:20:47 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 26 12:20:47 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 26 12:20:47 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 26 12:20:47 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 12:20:47 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 12:20:47 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 12:20:47 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 26 12:20:47 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 12:20:47 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 26 12:20:47 fir-md1-s1 kernel: LNet: Service thread pid 105005 was inactive for 202.24s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 26 12:21:25 fir-md1-s1 kernel: LustreError: 105266:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556306395, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8b3fddef2d00/0x378007f623b8535d lrc: 3/0,1 mode: --/PW res: [0x20001a1b4:0xf6b7:0x0].0x0 bits 0x40/0x0 rrc: 53 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 105266 timeout: 0 lvb_type: 0 Apr 26 12:21:25 fir-md1-s1 kernel: LustreError: 105266:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 9 previous similar messages Apr 26 12:22:10 fir-md1-s1 kernel: LNet: Service thread pid 105407 was inactive for 200.38s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 26 12:22:10 fir-md1-s1 kernel: LNet: Skipped 5 previous similar messages Apr 26 12:22:10 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556306530.105407 Apr 26 12:22:25 fir-md1-s1 kernel: LNet: Service thread pid 104996 completed after 299.81s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 26 12:22:25 fir-md1-s1 kernel: LNet: Skipped 2 previous similar messages Apr 26 12:22:25 fir-md1-s1 kernel: Lustre: 104948:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (154:146s); client may timeout. req@ffff8b62821e5a00 x1631558574609568/t0(0) o101->c9131811-e205-b209-c297-3fd8b5a2cc13@10.8.27.4@o2ib6:25/0 lens 568/1672 e 0 to 0 dl 1556306399 ref 1 fl Complete:/0/0 rc -107/-107 Apr 26 12:22:50 fir-md1-s1 kernel: Lustre: 105128:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b5695bbce00 x1631558574648592/t0(0) o101->c9131811-e205-b209-c297-3fd8b5a2cc13@10.8.27.4@o2ib6:25/0 lens 568/0 e 0 to 0 dl 1556306575 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Apr 26 12:22:50 fir-md1-s1 kernel: Lustre: 105128:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 33 previous similar messages Apr 26 12:24:55 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.27.4@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff8b613b2eba80/0x378007f6242e39ea lrc: 3/0,0 mode: PW/PW res: [0x20001a1b4:0xf6b7:0x0].0x0 bits 0x40/0x0 rrc: 54 type: IBT flags: 0x60200400000020 nid: 10.8.27.4@o2ib6 remote: 0x7abaf3db7f8f376d expref: 164 pid: 105232 timeout: 306529 lvb_type: 0 Apr 26 12:24:55 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Apr 26 12:25:45 fir-md1-s1 kernel: LNet: Service thread pid 105286 was inactive for 200.26s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 26 12:25:45 fir-md1-s1 kernel: LNet: Skipped 3 previous similar messages Apr 26 12:25:45 fir-md1-s1 kernel: Pid: 105286, comm: mdt01_053 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 12:25:45 fir-md1-s1 kernel: Call Trace: Apr 26 12:25:45 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 12:25:45 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 12:25:45 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 12:25:45 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 12:25:45 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 26 12:25:45 fir-md1-s1 kernel: [] mdt_hsm_state_set+0xc9/0x830 [mdt] Apr 26 12:25:45 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 12:25:45 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 12:25:45 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 12:25:45 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 26 12:25:45 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 12:25:45 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 26 12:25:45 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556306745.105286 Apr 26 12:25:47 fir-md1-s1 kernel: Pid: 104966, comm: mdt02_012 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 12:25:47 fir-md1-s1 kernel: Call Trace: Apr 26 12:25:47 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 12:25:47 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 12:25:47 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 12:25:47 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 12:25:47 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 26 12:25:47 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 26 12:25:47 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 26 12:25:47 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 26 12:25:47 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 26 12:25:47 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 26 12:25:47 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 26 12:25:47 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 12:25:47 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 12:25:47 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 12:25:47 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 26 12:25:47 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 12:25:47 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 26 12:25:47 fir-md1-s1 kernel: Pid: 104967, comm: mdt01_020 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 12:25:47 fir-md1-s1 kernel: Call Trace: Apr 26 12:25:47 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 12:25:47 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 12:25:47 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 12:25:47 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 12:25:47 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 26 12:25:47 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 26 12:25:47 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 26 12:25:47 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 26 12:25:47 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 26 12:25:47 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 12:25:47 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 12:25:47 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 12:25:47 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 26 12:25:47 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 12:25:47 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 26 12:25:47 fir-md1-s1 kernel: Pid: 104329, comm: mdt00_002 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 12:25:47 fir-md1-s1 kernel: Call Trace: Apr 26 12:25:47 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 12:25:47 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 12:25:47 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 12:25:47 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 12:25:47 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 26 12:25:47 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 26 12:25:47 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 26 12:25:47 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 26 12:25:47 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 26 12:25:47 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 26 12:25:47 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 26 12:25:47 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 12:25:47 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 12:25:47 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 12:25:47 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 26 12:25:47 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 12:25:47 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 26 12:25:47 fir-md1-s1 kernel: Pid: 105034, comm: mdt00_020 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 12:25:47 fir-md1-s1 kernel: Call Trace: Apr 26 12:25:47 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 12:25:47 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 12:25:47 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 12:25:47 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 12:25:47 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 26 12:25:47 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 26 12:25:47 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 26 12:25:47 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 26 12:25:47 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 26 12:25:47 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 26 12:25:47 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 26 12:25:47 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 12:25:47 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 12:25:47 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 12:25:47 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 26 12:25:47 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 12:25:47 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 26 12:25:47 fir-md1-s1 kernel: LNet: Service thread pid 104994 was inactive for 202.28s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 26 12:26:25 fir-md1-s1 kernel: LustreError: 105018:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556306695, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8b4afcbc9d40/0x378007f62471693e lrc: 3/1,0 mode: --/PR res: [0x20001a1b4:0xf6b7:0x0].0x0 bits 0x13/0x8 rrc: 52 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 105018 timeout: 0 lvb_type: 0 Apr 26 12:26:25 fir-md1-s1 kernel: LustreError: 105018:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 26 previous similar messages Apr 26 12:26:46 fir-md1-s1 kernel: LNetError: 20271:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Apr 26 12:26:46 fir-md1-s1 kernel: LNetError: 20271:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.3@o2ib7 (24): c: 6, oc: 0, rc: 8 Apr 26 12:26:46 fir-md1-s1 kernel: Lustre: 105052:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1556306800/real 1556306806] req@ffff8b5131e5c800 x1631589201496320/t0(0) o104->fir-MDT0000@10.0.10.3@o2ib7:15/16 lens 296/224 e 0 to 1 dl 1556306807 ref 1 fl Rpc:eX/0/ffffffff rc 0/-1 Apr 26 12:26:46 fir-md1-s1 kernel: Lustre: 105052:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Apr 26 12:26:50 fir-md1-s1 kernel: Lustre: 105052:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1556306810/real 1556306810] req@ffff8b5131e5c800 x1631589201496320/t0(0) o104->fir-MDT0000@10.0.10.3@o2ib7:15/16 lens 296/224 e 0 to 1 dl 1556306817 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 Apr 26 12:26:50 fir-md1-s1 kernel: Lustre: 105052:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 43432 previous similar messages Apr 26 12:26:58 fir-md1-s1 kernel: Lustre: 105121:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1556306818/real 1556306818] req@ffff8b469c24a100 x1631589201550448/t0(0) o104->fir-MDT0000@10.0.10.3@o2ib7:15/16 lens 296/224 e 0 to 1 dl 1556306825 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 Apr 26 12:26:58 fir-md1-s1 kernel: Lustre: 105121:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 161366 previous similar messages Apr 26 12:26:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to d793d107-3f05-1ade-e621-db73ca6f335e (at 10.8.27.7@o2ib6) Apr 26 12:26:59 fir-md1-s1 kernel: Lustre: Skipped 365 previous similar messages Apr 26 12:27:11 fir-md1-s1 kernel: LustreError: 105052:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.0.10.3@o2ib7) failed to reply to blocking AST (req@ffff8b5131e5c800 x1631589201496320 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff8b3bda8e7500/0x378007f62312bfcb lrc: 4/0,0 mode: PR/PR res: [0x20001a424:0x2:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x60000400000020 nid: 10.0.10.3@o2ib7 remote: 0xbbb5b46b296096e5 expref: 1622852 pid: 105269 timeout: 306694 lvb_type: 0 Apr 26 12:27:11 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.0.10.3@o2ib7 was evicted due to a lock blocking callback time out: rc -110 Apr 26 12:27:11 fir-md1-s1 kernel: LustreError: 105441:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8b4df6a8e600 x1631589201600832/t0(0) o104->fir-MDT0000@10.0.10.3@o2ib7:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Apr 26 12:27:11 fir-md1-s1 kernel: LustreError: 105441:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 1 previous similar message Apr 26 12:27:12 fir-md1-s1 kernel: LustreError: 104335:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8b5ed1f08900 x1631589201605344/t0(0) o104->fir-MDT0000@10.0.10.3@o2ib7:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Apr 26 12:27:16 fir-md1-s1 kernel: LustreError: 105268:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8b3a8f32da00 x1631589201619152/t0(0) o104->fir-MDT0000@10.0.10.3@o2ib7:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Apr 26 12:27:25 fir-md1-s1 kernel: LNet: Service thread pid 104994 completed after 299.85s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 26 12:27:25 fir-md1-s1 kernel: LustreError: 105409:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b51ac987400 ns: mdt-fir-MDT0000_UUID lock: ffff8b51f532a880/0x378007f6242e3edd lrc: 3/0,0 mode: PW/PW res: [0x20001a1b4:0xf6b7:0x0].0x0 bits 0x40/0x0 rrc: 49 type: IBT flags: 0x50200400000020 nid: 10.8.27.4@o2ib6 remote: 0x7abaf3db7f8f3774 expref: 6 pid: 105409 timeout: 0 lvb_type: 0 Apr 26 12:27:25 fir-md1-s1 kernel: LustreError: 105409:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 3 previous similar messages Apr 26 12:27:25 fir-md1-s1 kernel: Lustre: 105409:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (154:146s); client may timeout. req@ffff8b4c993ca400 x1631558574648560/t0(0) o101->c9131811-e205-b209-c297-3fd8b5a2cc13@10.8.27.4@o2ib6:25/0 lens 480/536 e 0 to 0 dl 1556306699 ref 1 fl Complete:/0/0 rc -107/-107 Apr 26 12:27:25 fir-md1-s1 kernel: LustreError: 104724:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8b4fbf26c200 x1631589201651264/t0(0) o104->fir-MDT0000@10.0.10.3@o2ib7:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Apr 26 12:27:25 fir-md1-s1 kernel: LNet: Skipped 21 previous similar messages Apr 26 12:27:41 fir-md1-s1 kernel: LustreError: 104958:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8b62943e6300 x1631589201704928/t0(0) o104->fir-MDT0000@10.0.10.3@o2ib7:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Apr 26 12:27:49 fir-md1-s1 kernel: LustreError: 104695:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8b62b10db900 x1631589201751120/t0(0) o104->fir-MDT0000@10.0.10.3@o2ib7:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Apr 26 12:27:49 fir-md1-s1 kernel: LustreError: 104695:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 1 previous similar message Apr 26 12:27:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client b0b0a554-034b-3c73-26cd-3b042ee0a246 (at 10.8.7.26@o2ib6) reconnecting Apr 26 12:27:56 fir-md1-s1 kernel: Lustre: Skipped 368 previous similar messages Apr 26 12:28:10 fir-md1-s1 kernel: LustreError: 104967:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8b4fbf26e900 x1631589201844896/t0(0) o104->fir-MDT0000@10.0.10.3@o2ib7:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Apr 26 12:28:10 fir-md1-s1 kernel: LustreError: 104967:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 5 previous similar messages Apr 26 12:28:44 fir-md1-s1 kernel: LustreError: 104952:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8b38a343f500 x1631589201977824/t0(0) o104->fir-MDT0000@10.0.10.3@o2ib7:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Apr 26 12:28:44 fir-md1-s1 kernel: LustreError: 104952:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 11 previous similar messages Apr 26 12:29:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client bc889374-b0ed-2371-0c2c-d84fc0dd852e (at 10.0.10.3@o2ib7) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b70d521a800, cur 1556306993 expire 1556306843 last 1556306766 Apr 26 12:30:00 fir-md1-s1 kernel: LNet: Service thread pid 105052 was inactive for 200.29s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 26 12:30:00 fir-md1-s1 kernel: LNet: Skipped 7 previous similar messages Apr 26 12:30:00 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556307000.105052 Apr 26 12:30:08 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 02604e59-b026-01a0-268a-d78f51fb35c5 (at 10.0.10.3@o2ib7) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8b7303a51800, cur 1556307008 expire 1556306858 last 1556306781 Apr 26 12:30:33 fir-md1-s1 kernel: LNet: Service thread pid 104335 was inactive for 200.27s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 26 12:30:33 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556307033.104335 Apr 26 12:30:45 fir-md1-s1 kernel: LNet: Service thread pid 105407 was inactive for 200.30s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 26 12:30:45 fir-md1-s1 kernel: LNet: Skipped 4 previous similar messages Apr 26 12:30:45 fir-md1-s1 kernel: Pid: 105407, comm: mdt00_044 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 12:30:45 fir-md1-s1 kernel: Call Trace: Apr 26 12:30:45 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 12:30:45 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 12:30:45 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 12:30:45 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 12:30:45 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 26 12:30:45 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 26 12:30:45 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 26 12:30:45 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 26 12:30:45 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 26 12:30:45 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 26 12:30:45 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 26 12:30:45 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 12:30:45 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 12:30:45 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 12:30:45 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 26 12:30:45 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 12:30:45 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 26 12:30:45 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556307045.105407 Apr 26 12:30:46 fir-md1-s1 kernel: Pid: 104994, comm: mdt01_025 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 12:30:46 fir-md1-s1 kernel: Call Trace: Apr 26 12:30:46 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 12:30:46 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 12:30:46 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 12:30:47 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 12:30:47 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 26 12:30:47 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 26 12:30:47 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 26 12:30:47 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 26 12:30:47 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 26 12:30:47 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 26 12:30:47 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 26 12:30:47 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 12:30:47 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 12:30:47 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 12:30:47 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 26 12:30:47 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 12:30:47 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 26 12:30:47 fir-md1-s1 kernel: Pid: 105121, comm: mdt01_043 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 12:30:47 fir-md1-s1 kernel: Call Trace: Apr 26 12:30:47 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 12:30:47 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 12:30:47 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 12:30:47 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 12:30:47 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 26 12:30:47 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 26 12:30:47 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 26 12:30:47 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 26 12:30:47 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 26 12:30:47 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 26 12:30:47 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 26 12:30:47 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 12:30:47 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 12:30:47 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 12:30:47 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 26 12:30:47 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 12:30:47 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 26 12:30:47 fir-md1-s1 kernel: Pid: 105310, comm: mdt01_061 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 12:30:47 fir-md1-s1 kernel: Call Trace: Apr 26 12:30:47 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 12:30:47 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 12:30:47 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 12:30:47 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 12:30:47 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 26 12:30:47 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 26 12:30:47 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 26 12:30:47 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 26 12:30:47 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 26 12:30:47 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 26 12:30:47 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 26 12:30:47 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 12:30:47 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 12:30:47 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 12:30:47 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 26 12:30:47 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 12:30:47 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 26 12:30:47 fir-md1-s1 kernel: Pid: 104720, comm: mdt01_006 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 12:30:47 fir-md1-s1 kernel: Call Trace: Apr 26 12:30:47 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 12:30:47 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 12:30:47 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 12:30:47 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 12:30:47 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 26 12:30:47 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 26 12:30:47 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 26 12:30:47 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 26 12:30:47 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 26 12:30:47 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 12:30:47 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 12:30:47 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 12:30:47 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 26 12:30:47 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 12:30:47 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 26 12:31:02 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556307062.104958 Apr 26 12:31:10 fir-md1-s1 kernel: LNet: Service thread pid 104695 was inactive for 200.40s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 26 12:31:10 fir-md1-s1 kernel: LNet: Skipped 7 previous similar messages Apr 26 12:31:10 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556307070.104695 Apr 26 12:31:19 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556307079.105126 Apr 26 12:31:21 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556307081.105423 Apr 26 12:31:30 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556307090.104967 Apr 26 12:31:33 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556307093.104996 Apr 26 12:31:37 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556307096.104356 Apr 26 12:31:53 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556307113.105266 Apr 26 12:31:55 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556307115.105301 Apr 26 12:31:56 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556307116.105247 Apr 26 12:32:05 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556307125.104952 Apr 26 12:32:25 fir-md1-s1 kernel: LNet: Service thread pid 104332 completed after 299.85s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 26 12:32:25 fir-md1-s1 kernel: LNet: Skipped 6 previous similar messages Apr 26 12:32:25 fir-md1-s1 kernel: Lustre: 104720:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (154:146s); client may timeout. req@ffff8b49cfa58f00 x1631534681073776/t0(0) o101->b0b0a554-034b-3c73-26cd-3b042ee0a246@10.8.7.26@o2ib6:25/0 lens 568/1672 e 0 to 0 dl 1556306999 ref 1 fl Complete:/0/0 rc -107/-107 Apr 26 12:32:25 fir-md1-s1 kernel: Lustre: 104720:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 3 previous similar messages Apr 26 12:32:28 fir-md1-s1 kernel: LNet: Service thread pid 105124 was inactive for 200.38s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 26 12:32:28 fir-md1-s1 kernel: LNet: Skipped 12 previous similar messages Apr 26 12:32:28 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556307148.105124 Apr 26 12:32:31 fir-md1-s1 kernel: LNet: Service thread pid 104996 completed after 258.42s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 26 12:32:31 fir-md1-s1 kernel: LNet: Skipped 2 previous similar messages Apr 26 12:32:31 fir-md1-s1 kernel: LustreError: 105027:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8b5ef948ad00 x1631589202851536/t0(0) o104->fir-MDT0000@10.0.10.3@o2ib7:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Apr 26 12:32:31 fir-md1-s1 kernel: LustreError: 105027:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 3 previous similar messages Apr 26 12:32:56 fir-md1-s1 kernel: Lustre: 104334:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8b5de3437b00 x1631301857577744/t0(0) o36->1b540135-ca45-f45a-a248-eb5cb55b2913@10.8.29.4@o2ib6:1/0 lens 488/3152 e 0 to 0 dl 1556307181 ref 2 fl Interpret:/0/0 rc 0/0 Apr 26 12:32:56 fir-md1-s1 kernel: Lustre: 104334:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 83 previous similar messages Apr 26 12:33:31 fir-md1-s1 kernel: LNet: Service thread pid 104958 completed after 349.44s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 26 12:33:41 fir-md1-s1 kernel: LNet: Service thread pid 105423 completed after 340.04s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 26 12:33:48 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.0.10.3@o2ib7 ns: mdt-fir-MDT0000_UUID lock: ffff8b34eca26c00/0x378007f609f2e2ad lrc: 3/0,0 mode: PR/PR res: [0x200016350:0x2fb:0x0].0x0 bits 0x5b/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.0.10.3@o2ib7 remote: 0xbbb5b46b239aceee expref: 494508 pid: 105244 timeout: 307062 lvb_type: 0 Apr 26 12:33:48 fir-md1-s1 kernel: LustreError: 20444:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 30 previous similar messages Apr 26 12:34:36 fir-md1-s1 kernel: LNet: Service thread pid 104356 completed after 379.63s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 26 12:34:36 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Apr 26 12:35:11 fir-md1-s1 kernel: LustreError: 105423:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556307221, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff8b4cc0bbbcc0/0x378007f625e86d73 lrc: 3/0,1 mode: --/PW res: [0x200011521:0x1b9:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 105423 timeout: 0 lvb_type: 0 Apr 26 12:35:11 fir-md1-s1 kernel: LustreError: 105423:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 52 previous similar messages Apr 26 12:35:12 fir-md1-s1 kernel: LNet: Service thread pid 105126 completed after 433.57s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 26 12:35:27 fir-md1-s1 kernel: LustreError: 104932:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8b4df8adc500 x1631589204198016/t0(0) o104->fir-MDT0000@10.0.10.3@o2ib7:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Apr 26 12:35:27 fir-md1-s1 kernel: LustreError: 104932:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 4 previous similar messages Apr 26 12:35:57 fir-md1-s1 kernel: Lustre: 105252:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:30s); client may timeout. req@ffff8b37a65bec00 x1631558751530624/t0(0) o55->7af6ae8a-9232-e3ea-cd7f-f47578cc5c43@10.8.7.29@o2ib6:27/0 lens 472/192 e 0 to 0 dl 1556307327 ref 1 fl Complete:/0/0 rc -22/-22 Apr 26 12:35:57 fir-md1-s1 kernel: LustreError: 105421:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8b730e35a000 ns: mdt-fir-MDT0000_UUID lock: ffff8b4315e98900/0x378007f62621216d lrc: 3/0,0 mode: PR/PR res: [0x20001a98d:0x20:0x0].0x0 bits 0x1b/0x0 rrc: 43 type: IBT flags: 0x50200000000000 nid: 10.8.7.30@o2ib6 remote: 0x4262b8f4ed9b0749 expref: 5 pid: 105421 timeout: 0 lvb_type: 0 Apr 26 12:35:57 fir-md1-s1 kernel: LustreError: 105421:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 8 previous similar messages Apr 26 12:35:57 fir-md1-s1 kernel: Lustre: 105252:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2 previous similar messages Apr 26 12:36:39 fir-md1-s1 kernel: LNet: Service thread pid 105027 was inactive for 200.68s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 26 12:36:39 fir-md1-s1 kernel: LNet: Skipped 4 previous similar messages Apr 26 12:36:39 fir-md1-s1 kernel: Pid: 105027, comm: mdt02_016 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 12:36:39 fir-md1-s1 kernel: Call Trace: Apr 26 12:36:39 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 12:36:39 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 12:36:39 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 12:36:39 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 12:36:39 fir-md1-s1 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Apr 26 12:36:39 fir-md1-s1 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Apr 26 12:36:39 fir-md1-s1 kernel: [] mdt_reint_setattr+0x6c8/0x1340 [mdt] Apr 26 12:36:39 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Apr 26 12:36:39 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Apr 26 12:36:39 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Apr 26 12:36:39 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 12:36:39 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 12:36:39 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 12:36:39 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 26 12:36:39 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 12:36:39 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 26 12:36:39 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556307399.105027 Apr 26 12:36:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.8.7.35@o2ib6) Apr 26 12:36:59 fir-md1-s1 kernel: Lustre: Skipped 436 previous similar messages Apr 26 12:37:01 fir-md1-s1 kernel: LNet: Service thread pid 105423 was inactive for 200.48s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 26 12:37:01 fir-md1-s1 kernel: Pid: 105423, comm: mdt01_067 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 12:37:01 fir-md1-s1 kernel: Call Trace: Apr 26 12:37:01 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 12:37:01 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 12:37:01 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 12:37:01 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 12:37:01 fir-md1-s1 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Apr 26 12:37:01 fir-md1-s1 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Apr 26 12:37:01 fir-md1-s1 kernel: [] mdt_reint_setattr+0x6c8/0x1340 [mdt] Apr 26 12:37:01 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Apr 26 12:37:01 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Apr 26 12:37:01 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Apr 26 12:37:01 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 12:37:01 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 12:37:01 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 12:37:01 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 26 12:37:01 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 12:37:01 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 26 12:37:01 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556307421.105423 Apr 26 12:37:20 fir-md1-s1 kernel: LNet: Service thread pid 105052 completed after 639.92s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 26 12:37:20 fir-md1-s1 kernel: LNet: Skipped 3 previous similar messages Apr 26 12:37:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client c36ce969-5c05-7bd7-e7d6-c7bd68e2bfa0 (at 10.8.7.34@o2ib6) reconnecting Apr 26 12:37:59 fir-md1-s1 kernel: Lustre: Skipped 436 previous similar messages Apr 26 12:38:18 fir-md1-s1 kernel: Lustre: 105237:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:3631s); client may timeout. req@ffff8b3ed065bc00 x1631558582921008/t0(0) o101->c1d9f0f7-d490-e556-ed11-756e6b122018@10.9.104.22@o2ib4:17/0 lens 568/1672 e 0 to 0 dl 1556303867 ref 1 fl Complete:/0/0 rc -107/-107 Apr 26 12:38:18 fir-md1-s1 kernel: Lustre: 105237:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 5 previous similar messages Apr 26 12:39:28 fir-md1-s1 kernel: LNet: Service thread pid 104695 completed after 698.71s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Apr 26 12:39:28 fir-md1-s1 kernel: LNet: Skipped 12 previous similar messages Apr 26 12:39:47 fir-md1-s1 kernel: LNet: Service thread pid 105306 was inactive for 200.14s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 26 12:39:47 fir-md1-s1 kernel: Pid: 105306, comm: mdt01_059 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 12:39:47 fir-md1-s1 kernel: Call Trace: Apr 26 12:39:47 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 12:39:47 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 12:39:47 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 12:39:47 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 12:39:47 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 26 12:39:47 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 26 12:39:47 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 26 12:39:47 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 26 12:39:47 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 26 12:39:47 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 26 12:39:47 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 26 12:39:47 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 12:39:47 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 12:39:47 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 12:39:47 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 26 12:39:47 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 12:39:47 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 26 12:39:47 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556307587.105306 Apr 26 12:39:48 fir-md1-s1 kernel: Pid: 105418, comm: mdt02_046 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 12:39:48 fir-md1-s1 kernel: Call Trace: Apr 26 12:39:48 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 12:39:48 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 12:39:48 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 12:39:48 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 12:39:48 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 26 12:39:48 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 26 12:39:48 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 26 12:39:48 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 26 12:39:48 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 26 12:39:48 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 26 12:39:48 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 26 12:39:48 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 12:39:48 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 12:39:48 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 12:39:48 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 26 12:39:48 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 12:39:48 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 26 12:39:48 fir-md1-s1 kernel: Pid: 105441, comm: mdt01_068 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 12:39:48 fir-md1-s1 kernel: Call Trace: Apr 26 12:39:48 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 12:39:48 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 12:39:48 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 12:39:48 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 12:39:48 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Apr 26 12:39:48 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Apr 26 12:39:48 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Apr 26 12:39:48 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 26 12:39:48 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 26 12:39:48 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 26 12:39:48 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 26 12:39:48 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 12:39:48 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 12:39:48 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 12:39:48 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 26 12:39:48 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 12:39:48 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 26 12:39:48 fir-md1-s1 kernel: LNet: Service thread pid 105244 was inactive for 201.57s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Apr 26 12:40:13 fir-md1-s1 kernel: LustreError: 105264:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8b37e3b6d700 x1631589206972704/t0(0) o104->fir-MDT0000@10.0.10.3@o2ib7:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Apr 26 12:40:13 fir-md1-s1 kernel: LustreError: 105264:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 4 previous similar messages Apr 26 12:40:17 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556307617.105018 Apr 26 12:40:18 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556307618.105269 Apr 26 12:40:47 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556307647.105415 Apr 26 12:40:51 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556307651.105000 Apr 26 12:41:17 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556307677.105106 Apr 26 12:41:39 fir-md1-s1 kernel: LNet: Service thread pid 104389 was inactive for 200.22s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Apr 26 12:41:39 fir-md1-s1 kernel: LNet: Skipped 2 previous similar messages Apr 26 12:41:39 fir-md1-s1 kernel: Pid: 104389, comm: mdt01_004 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 12:41:39 fir-md1-s1 kernel: Call Trace: Apr 26 12:41:39 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 12:41:39 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 12:41:39 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 12:41:39 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 12:41:39 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 26 12:41:39 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 26 12:41:39 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 26 12:41:39 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 26 12:41:39 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 26 12:41:39 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 26 12:41:39 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 26 12:41:39 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 12:41:39 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 12:41:39 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 12:41:39 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 26 12:41:39 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 12:41:39 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 26 12:41:39 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556307699.104389 Apr 26 12:41:39 fir-md1-s1 kernel: Pid: 105399, comm: mdt03_023 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 12:41:39 fir-md1-s1 kernel: Call Trace: Apr 26 12:41:39 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 12:41:39 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 12:41:39 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 12:41:39 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 12:41:39 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 26 12:41:39 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 26 12:41:39 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 26 12:41:39 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 26 12:41:39 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 26 12:41:39 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 26 12:41:39 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 26 12:41:39 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 12:41:39 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 12:41:39 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 12:41:39 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 26 12:41:39 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 12:41:39 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 26 12:41:39 fir-md1-s1 kernel: Pid: 104966, comm: mdt02_012 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 12:41:39 fir-md1-s1 kernel: Call Trace: Apr 26 12:41:39 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 12:41:39 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 12:41:39 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 12:41:39 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 12:41:39 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 26 12:41:39 fir-md1-s1 kernel: [] mdt_hsm_state_set+0xc9/0x830 [mdt] Apr 26 12:41:39 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 12:41:39 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 12:41:39 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 12:41:39 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 26 12:41:39 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 12:41:39 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 26 12:41:39 fir-md1-s1 kernel: Pid: 105074, comm: mdt00_024 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 12:41:39 fir-md1-s1 kernel: Call Trace: Apr 26 12:41:39 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 12:41:39 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 12:41:39 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 12:41:39 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 12:41:39 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Apr 26 12:41:39 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Apr 26 12:41:39 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Apr 26 12:41:39 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 26 12:41:39 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 26 12:41:39 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 26 12:41:39 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 26 12:41:39 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 12:41:39 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 12:41:39 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 12:41:39 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 26 12:41:39 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 12:41:39 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 26 12:41:39 fir-md1-s1 kernel: Pid: 105100, comm: mdt00_026 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Apr 26 12:41:39 fir-md1-s1 kernel: Call Trace: Apr 26 12:41:39 fir-md1-s1 kernel: [] ldlm_completion_ast+0x4e5/0x890 [ptlrpc] Apr 26 12:41:39 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Apr 26 12:41:39 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Apr 26 12:41:39 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Apr 26 12:41:39 fir-md1-s1 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Apr 26 12:41:39 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Apr 26 12:41:39 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Apr 26 12:41:39 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Apr 26 12:41:39 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Apr 26 12:41:39 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Apr 26 12:41:39 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Apr 26 12:41:39 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Apr 26 12:41:39 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Apr 26 12:41:39 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Apr 26 12:41:39 fir-md1-s1 kernel: [] 0xffffffffffffffff Apr 26 12:41:40 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1556307700.105421 Apr 26 12:43:18 fir-md1-s1 kernel: Lustre: 105100:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (154:146s); client may timeout. req@ffff8b3827eeec00 x1631585556997728/t0(0) o101->16749711-2a27-479b-83fc-14b2199ba6af@10.9.104.18@o2ib4:18/0 lens 568/1672 e 0 to 0 dl 1556307652 ref 1 fl Complete:/0/0 rc -107/-107 Apr 26 12:43:18 fir-md1-s1 kernel: Lustre: 105100:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Apr 26 12:47:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.8.27.4@o2ib6) Apr 26 12:47:15 fir-md1-s1 kernel: Lustre: Skipped 398 previous similar messages Apr 26 12:48:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client ee5a039a-f4f0-2c7c-2913-32a6706993e3 (at 10.8.7.31@o2ib6) reconnecting Apr 26 12:48:17 fir-md1-s1 kernel: Lustre: Skipped 378 previous similar messages