Jun 17 03:32:00 nbp8-mds1 kernel: [1707856.433666] Lustre: MGS: Connection restored to 3f03c423-1840-ea18-b183-5810f0854488 (at 10.149.15.123@o2ib313) Jun 17 03:32:00 nbp8-mds1 kernel: [1707856.433671] Lustre: Skipped 136 previous similar messages Jun 17 03:33:31 nbp8-mds1 kernel: [1707947.327187] Lustre: MGS: Connection restored to da34485e-ac06-af04-cb21-1e65c256a8a7 (at 10.149.9.189@o2ib313) Jun 17 03:33:31 nbp8-mds1 kernel: [1707947.327193] Lustre: Skipped 59 previous similar messages Jun 17 03:37:42 nbp8-mds1 kernel: [1708198.584685] Lustre: MGS: Connection restored to cebc4208-ab17-f2aa-b56a-685ad0e97a47 (at 10.149.11.50@o2ib313) Jun 17 03:37:42 nbp8-mds1 kernel: [1708198.584691] Lustre: Skipped 59 previous similar messages Jun 17 03:45:02 nbp8-mds1 kernel: [1708637.988796] Lustre: MGS: Connection restored to ec3ab769-223f-1867-089c-1469d913821e (at 10.151.28.59@o2ib) Jun 17 03:45:02 nbp8-mds1 kernel: [1708637.988802] Lustre: Skipped 157 previous similar messages Jun 17 03:59:40 nbp8-mds1 kernel: [1709516.073036] Lustre: MGS: Connection restored to 5e2d78f6-e5a2-32de-9ea8-06fd724c24b1 (at 10.149.8.65@o2ib313) Jun 17 03:59:40 nbp8-mds1 kernel: [1709516.073042] Lustre: Skipped 483 previous similar messages Jun 17 04:11:02 nbp8-mds1 rsyslogd: -- MARK -- Jun 17 04:12:10 nbp8-mds1 kernel: [1710266.752189] Lustre: MGS: Connection restored to ea0e1eaf-c60c-b1b6-0cf6-5e46c5c4e68d (at 10.151.43.151@o2ib) Jun 17 04:12:10 nbp8-mds1 kernel: [1710266.752195] Lustre: Skipped 259 previous similar messages Jun 17 04:22:13 nbp8-mds1 kernel: [1710869.215964] Lustre: MGS: Connection restored to 841cf1ab-782c-6bfc-ed2f-baacffbb4609 (at 10.149.9.1@o2ib313) Jun 17 04:22:13 nbp8-mds1 kernel: [1710869.215970] Lustre: Skipped 117 previous similar messages Jun 17 04:33:12 nbp8-mds1 kernel: [1711527.918373] Lustre: MGS: Connection restored to 3f03c423-1840-ea18-b183-5810f0854488 (at 10.149.15.123@o2ib313) Jun 17 04:33:12 nbp8-mds1 kernel: [1711527.918379] Lustre: Skipped 283 previous similar messages Jun 17 04:45:22 nbp8-mds1 kernel: [1712258.731608] Lustre: MGS: Connection restored to 021360a6-5521-c8d4-5c13-166bdfe52cbe (at 10.149.14.17@o2ib313) Jun 17 04:45:22 nbp8-mds1 kernel: [1712258.731614] Lustre: Skipped 123 previous similar messages Jun 17 05:06:19 nbp8-mds1 kernel: [1713515.267083] Lustre: MGS: Connection restored to a150e7a1-c1c5-59f5-2edb-9ec578dcb214 (at 10.141.2.55@o2ib417) Jun 17 05:06:19 nbp8-mds1 kernel: [1713515.267089] Lustre: Skipped 7 previous similar messages Jun 17 05:08:43 nbp8-mds1 kernel: [1713659.458153] Lustre: MGS: Connection restored to 1ad03996-bd4d-dc9d-8702-4fa5a72a887b (at 10.149.3.123@o2ib313) Jun 17 05:08:43 nbp8-mds1 kernel: [1713659.458159] Lustre: Skipped 103 previous similar messages Jun 17 05:11:02 nbp8-mds1 rsyslogd: -- MARK -- Jun 17 05:12:16 nbp8-mds1 kernel: [1713872.422569] Lustre: MGS: Connection restored to 4a6c80ae-ff59-fb66-d8f2-e34280309352 (at 10.149.3.69@o2ib313) Jun 17 05:12:16 nbp8-mds1 kernel: [1713872.422575] Lustre: Skipped 1 previous similar message Jun 17 05:25:43 nbp8-mds1 kernel: [1714679.187617] Lustre: MGS: Connection restored to a041e8bc-df77-f027-14ae-2d51c857c6e8 (at 10.151.33.200@o2ib) Jun 17 05:25:43 nbp8-mds1 kernel: [1714679.187622] Lustre: Skipped 77 previous similar messages Jun 17 05:36:36 nbp8-mds1 kernel: [1715332.556456] Lustre: MGS: Connection restored to 7a3f88cb-67f1-e00d-64d8-28fcdb6e9ff3 (at 10.151.32.86@o2ib) Jun 17 05:36:36 nbp8-mds1 kernel: [1715332.556461] Lustre: Skipped 307 previous similar messages Jun 17 05:46:52 nbp8-mds1 kernel: [1715948.553456] Lustre: MGS: Connection restored to 47fc7a36-f569-c415-5345-985f9af11063 (at 10.149.14.14@o2ib313) Jun 17 05:46:52 nbp8-mds1 kernel: [1715948.553461] Lustre: Skipped 301 previous similar messages Jun 17 05:57:20 nbp8-mds1 kernel: [1716576.612516] Lustre: MGS: Connection restored to caeb7583-63d5-2bad-2d03-7f3205fab2f3 (at 10.149.14.191@o2ib313) Jun 17 05:57:20 nbp8-mds1 kernel: [1716576.612521] Lustre: Skipped 127 previous similar messages Jun 17 06:08:54 nbp8-mds1 kernel: [1717270.649929] Lustre: MGS: Connection restored to 9570049d-2c13-9daf-a0f6-650a73255625 (at 10.149.9.105@o2ib313) Jun 17 06:08:54 nbp8-mds1 kernel: [1717270.649934] Lustre: Skipped 109 previous similar messages Jun 17 06:11:03 nbp8-mds1 rsyslogd: -- MARK -- Jun 17 06:26:48 nbp8-mds1 kernel: [1718344.462907] Lustre: MGS: Connection restored to 1ad03996-bd4d-dc9d-8702-4fa5a72a887b (at 10.149.3.123@o2ib313) Jun 17 06:26:48 nbp8-mds1 kernel: [1718344.462913] Lustre: Skipped 127 previous similar messages Jun 17 06:36:50 nbp8-mds1 kernel: [1718947.070688] Lustre: MGS: Connection restored to 5d27c3ee-9ac6-d726-2e49-843e88e1f3e6 (at 10.151.35.186@o2ib) Jun 17 06:36:50 nbp8-mds1 kernel: [1718947.070693] Lustre: Skipped 71 previous similar messages Jun 17 06:48:56 nbp8-mds1 kernel: [1719672.549273] Lustre: MGS: Connection restored to e0f50835-2494-502f-06a0-cde678c617b6 (at 10.141.5.148@o2ib417) Jun 17 06:48:56 nbp8-mds1 kernel: [1719672.549279] Lustre: Skipped 91 previous similar messages Jun 17 07:04:12 nbp8-mds1 kernel: [1720588.938605] Lustre: MGS: Connection restored to 06ca9284-da8c-feaf-8520-b650ed3d379a (at 10.151.8.35@o2ib) Jun 17 07:04:12 nbp8-mds1 kernel: [1720588.938611] Lustre: Skipped 21 previous similar messages Jun 17 07:11:03 nbp8-mds1 rsyslogd: -- MARK -- Jun 17 07:14:45 nbp8-mds1 kernel: [1721221.426037] Lustre: MGS: Connection restored to 3f83b21b-03e5-a508-d946-828a0828fac2 (at 10.151.36.139@o2ib) Jun 17 07:14:45 nbp8-mds1 kernel: [1721221.426043] Lustre: Skipped 421 previous similar messages Jun 17 07:28:05 nbp8-mds1 kernel: [1722021.685190] Lustre: MGS: Connection restored to edaba6e3-78f8-144e-e9db-c09208e37bb9 (at 10.151.30.236@o2ib) Jun 17 07:28:05 nbp8-mds1 kernel: [1722021.685196] Lustre: Skipped 39 previous similar messages Jun 17 07:40:22 nbp8-mds1 kernel: [1722758.624536] Lustre: MGS: Connection restored to e302294f-18ad-f68b-f18f-317eeaa1ea47 (at 10.141.6.70@o2ib417) Jun 17 07:40:22 nbp8-mds1 kernel: [1722758.624542] Lustre: Skipped 345 previous similar messages Jun 17 07:50:35 nbp8-mds1 kernel: [1723372.175916] Lustre: MGS: Connection restored to f5b799e8-b43b-41d2-8f60-25a01c95d4aa (at 10.151.36.87@o2ib) Jun 17 07:50:35 nbp8-mds1 kernel: [1723372.175922] Lustre: Skipped 95 previous similar messages Jun 17 08:01:02 nbp8-mds1 kernel: [1723998.518885] Lustre: MGS: Connection restored to 40e791bf-6a5f-b363-87d9-81f1842abf17 (at 10.149.15.98@o2ib313) Jun 17 08:01:02 nbp8-mds1 kernel: [1723998.518890] Lustre: Skipped 71 previous similar messages Jun 17 08:11:03 nbp8-mds1 rsyslogd: -- MARK -- Jun 17 08:11:41 nbp8-mds1 kernel: [1724638.157814] Lustre: MGS: Connection restored to 2011d8eb-0d92-9f56-8746-8c659f519f86 (at 10.151.29.137@o2ib) Jun 17 08:11:41 nbp8-mds1 kernel: [1724638.157819] Lustre: Skipped 1103 previous similar messages Jun 17 08:21:47 nbp8-mds1 kernel: [1725244.227365] Lustre: MGS: Connection restored to 2c3cf03c-a5e4-3604-be1d-a5148cb632d2 (at 10.151.32.145@o2ib) Jun 17 08:21:47 nbp8-mds1 kernel: [1725244.227371] Lustre: Skipped 153 previous similar messages Jun 17 08:33:42 nbp8-mds1 kernel: [1725958.780847] Lustre: MGS: Connection restored to d4d3c562-fd68-3735-e4bd-3a7ba4004370 (at 10.151.33.27@o2ib) Jun 17 08:33:42 nbp8-mds1 kernel: [1725958.780852] Lustre: Skipped 101 previous similar messages Jun 17 08:44:57 nbp8-mds1 kernel: [1726634.229555] Lustre: MGS: Connection restored to 9fd728a5-912f-4d08-ead5-8e763e526542 (at 10.151.36.174@o2ib) Jun 17 08:44:57 nbp8-mds1 kernel: [1726634.229561] Lustre: Skipped 309 previous similar messages Jun 17 08:55:02 nbp8-mds1 kernel: [1727238.531746] Lustre: MGS: Connection restored to 4dcbae0d-e3c2-58e0-62b6-3c9f6d3f7e41 (at 10.151.0.77@o2ib) Jun 17 08:55:02 nbp8-mds1 kernel: [1727238.531752] Lustre: Skipped 335 previous similar messages Jun 17 09:05:11 nbp8-mds1 kernel: [1727847.945182] Lustre: MGS: Connection restored to 9370f8c3-9f59-68f4-782e-8b7d544b42ec (at 10.149.9.201@o2ib313) Jun 17 09:05:11 nbp8-mds1 kernel: [1727847.945188] Lustre: Skipped 103 previous similar messages Jun 17 09:06:02 nbp8-mds1 nscd: 2557 monitored file `/etc/passwd` was moved, removing watch Jun 17 09:06:02 nbp8-mds1 nscd: 2557 monitored file `/etc/passwd` was created, adding watch Jun 17 09:06:02 nbp8-mds1 nscd: 2557 failed to add file watch `/etc/passwd`: Permission denied Jun 17 09:11:03 nbp8-mds1 rsyslogd: -- MARK -- Jun 17 09:16:02 nbp8-mds1 nscd: 2557 monitored file `/etc/passwd` changed (mtime) Jun 17 09:16:02 nbp8-mds1 nscd: 2557 monitoring file `/etc/passwd` (107) Jun 17 09:16:02 nbp8-mds1 nscd: 2557 monitoring directory `/etc` (2) Jun 17 09:18:30 nbp8-mds1 kernel: [1728647.381202] Lustre: MGS: Connection restored to eb95c421-6a2f-cc81-4928-4dc43feeea69 (at 10.151.39.70@o2ib) Jun 17 09:18:30 nbp8-mds1 kernel: [1728647.381207] Lustre: Skipped 127 previous similar messages Jun 17 09:31:02 nbp8-mds1 kernel: [1729399.390864] Lustre: MGS: Connection restored to d34a7f1e-aabe-16b7-559c-465c7c0a38c6 (at 10.151.28.208@o2ib) Jun 17 09:31:02 nbp8-mds1 kernel: [1729399.390869] Lustre: Skipped 37 previous similar messages Jun 17 09:41:12 nbp8-mds1 kernel: [1730009.244599] Lustre: MGS: Connection restored to 66fae42f-8bef-6f1b-6967-4279fb40aec1 (at 10.151.56.131@o2ib) Jun 17 09:41:12 nbp8-mds1 kernel: [1730009.244606] Lustre: Skipped 69 previous similar messages Jun 17 09:45:13 nbp8-mds1 kernel: [1730250.486811] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds Jun 17 09:45:13 nbp8-mds1 kernel: [1730250.520605] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.30.33@o2ib (299): c: 32, oc: 0, rc: 32 Jun 17 09:45:14 nbp8-mds1 sec[2849]: SEC_EVENT |msg lustre rdma timeout|nid 10.151.30.33@o2ib Jun 17 09:51:54 nbp8-mds1 kernel: [1730650.761076] Lustre: MGS: Connection restored to 13cee315-7090-62b7-1a51-871b4aa1faa4 (at 10.151.32.193@o2ib) Jun 17 09:51:54 nbp8-mds1 kernel: [1730650.761082] Lustre: Skipped 335 previous similar messages Jun 17 09:54:59 nbp8-mds1 kernel: [1730835.599762] Lustre: MGS: haven't heard from client 1e932b72-a3dd-f47f-da45-47a804001c06 (at 10.151.27.18@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3e6b63c00, cur 1592412899 expire 1592412749 last 1592412672 Jun 17 09:55:16 nbp8-mds1 kernel: [1730852.604571] Lustre: nbp8-MDT0000: haven't heard from client c7b34585-cc74-1c2c-40fe-9075ac8b5089 (at 10.151.27.18@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89791d20a800, cur 1592412916 expire 1592412766 last 1592412689 Jun 17 10:07:15 nbp8-mds1 kernel: [1731572.522112] Lustre: MGS: Connection restored to 2b506c31-eb65-784f-6409-3dc96b4dc60e (at 10.151.1.33@o2ib) Jun 17 10:07:15 nbp8-mds1 kernel: [1731572.522118] Lustre: Skipped 313 previous similar messages Jun 17 10:11:03 nbp8-mds1 rsyslogd: -- MARK -- Jun 17 10:24:22 nbp8-mds1 kernel: [1732599.578090] Lustre: MGS: Connection restored to ecce0121-d1eb-dc13-6ad3-1e09eee96902 (at 10.151.56.123@o2ib) Jun 17 10:24:22 nbp8-mds1 kernel: [1732599.578095] Lustre: Skipped 47 previous similar messages Jun 17 10:33:39 nbp8-mds1 kernel: [1733156.398461] LNet: 3587:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.27.18@o2ib version 12/12 incarnation 1591240360579877/1592415213938152 Jun 17 10:40:09 nbp8-mds1 kernel: [1733545.696711] Lustre: nbp8-MDT0000: haven't heard from client 5997e6f3-c410-a94e-9c24-a0da28d411de (at 10.149.5.142@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a4071b0000, cur 1592415609 expire 1592415459 last 1592415382 Jun 17 10:41:00 nbp8-mds1 kernel: [1733597.282981] Lustre: 7282:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (30/30), not sending early reply Jun 17 10:41:00 nbp8-mds1 kernel: [1733597.282981] req@ffff898dedb43180 x1669086539350432/t0(0) o101->66395ede-9692-99a5-5d3d-e0ac4b5112e8@10.149.8.10@o2ib313:645/0 lens 576/3264 e 0 to 0 dl 1592415690 ref 2 fl Interpret:/0/0 rc 0/0 Jun 17 10:41:00 nbp8-mds1 kernel: [1733597.380262] Lustre: 7282:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 742 previous similar messages Jun 17 10:41:25 nbp8-mds1 kernel: [1733621.703481] Lustre: nbp8-MDT0000: haven't heard from client 11340999-b905-b7d6-f2d7-9285f417ce91 (at 10.151.36.73@o2ib) in 182 seconds. I think it's dead, and I am evicting it. exp ffff89a218b9b400, cur 1592415685 expire 1592415535 last 1592415503 Jun 17 10:41:25 nbp8-mds1 kernel: [1733621.776161] Lustre: Skipped 4 previous similar messages Jun 17 10:41:25 nbp8-mds1 sec[2849]: Evaluating code '4 > 1500' and setting variable '%num' Jun 17 10:41:25 nbp8-mds1 sec[2849]: Variable '%num' set to '' Jun 17 10:41:40 nbp8-mds1 kernel: [1733637.364442] Lustre: 14060:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (30/30), not sending early reply Jun 17 10:41:40 nbp8-mds1 kernel: [1733637.364442] req@ffff8991f5a8bf00 x1669049263033280/t0(0) o39->5997e6f3-c410-a94e-9c24-a0da28d411de@10.149.5.142@o2ib313:685/0 lens 224/0 e 0 to 0 dl 1592415730 ref 2 fl New:/0/ffffffff rc 0/-1 Jun 17 10:41:40 nbp8-mds1 kernel: [1733637.461725] Lustre: 14060:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 856 previous similar messages Jun 17 10:42:41 nbp8-mds1 kernel: [1733697.704079] Lustre: nbp8-MDT0000: haven't heard from client b76468c7-3989-b05d-9305-643cf2f5be62 (at 10.149.4.58@o2ib313) in 220 seconds. I think it's dead, and I am evicting it. exp ffff898f3f7fa800, cur 1592415761 expire 1592415611 last 1592415541 Jun 17 10:42:55 nbp8-mds1 kernel: [1733712.535190] Lustre: 14060:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (30/30), not sending early reply Jun 17 10:42:55 nbp8-mds1 kernel: [1733712.535190] req@ffff8991bd70d580 x1669163757377968/t0(0) o101->eac32f30-d800-3921-5c6f-9013827f1450@10.151.37.27@o2ib:5/0 lens 4512/0 e 0 to 0 dl 1592415805 ref 2 fl New:/0/ffffffff rc 0/-1 Jun 17 10:42:55 nbp8-mds1 kernel: [1733712.631614] Lustre: 14060:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 655 previous similar messages Jun 17 10:43:57 nbp8-mds1 kernel: [1733773.705474] Lustre: nbp8-MDT0000: haven't heard from client 83812c74-19f8-4e3f-67fc-e131af375fcb (at 10.151.23.129@o2ib) in 211 seconds. I think it's dead, and I am evicting it. exp ffff89904aac7c00, cur 1592415837 expire 1592415687 last 1592415626 Jun 17 10:43:57 nbp8-mds1 kernel: [1733773.778449] Lustre: Skipped 37 previous similar messages Jun 17 10:43:57 nbp8-mds1 sec[2849]: Evaluating code '37 > 1500' and setting variable '%num' Jun 17 10:43:57 nbp8-mds1 sec[2849]: Variable '%num' set to '' Jun 17 10:44:54 nbp8-mds1 kernel: [1733830.763535] LNet: Service thread pid 8583 was inactive for 550.95s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jun 17 10:44:54 nbp8-mds1 kernel: [1733830.819635] LNet: Skipped 3 previous similar messages Jun 17 10:44:54 nbp8-mds1 kernel: [1733830.836820] Pid: 8583, comm: mdt01_038 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020 Jun 17 10:44:54 nbp8-mds1 kernel: [1733830.836824] Call Trace: Jun 17 10:44:54 nbp8-mds1 kernel: [1733830.836837] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] Jun 17 10:44:54 nbp8-mds1 kernel: [1733830.842104] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] Jun 17 10:44:54 nbp8-mds1 kernel: [1733830.842129] [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jun 17 10:44:54 nbp8-mds1 kernel: [1733830.842140] [] mdt_object_lock_internal+0x70/0x360 [mdt] Jun 17 10:44:54 nbp8-mds1 kernel: [1733830.842150] [] mdt_object_lock_try+0x27/0xb0 [mdt] Jun 17 10:44:54 nbp8-mds1 kernel: [1733830.842162] [] mdt_getattr_name_lock+0x1277/0x1c30 [mdt] Jun 17 10:44:54 nbp8-mds1 kernel: [1733830.842172] [] mdt_intent_getattr+0x2b5/0x480 [mdt] Jun 17 10:44:54 nbp8-mds1 kernel: [1733830.842183] [] mdt_intent_policy+0x435/0xd80 [mdt] Jun 17 10:44:54 nbp8-mds1 kernel: [1733830.842213] [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] Jun 17 10:44:54 nbp8-mds1 kernel: [1733830.842246] [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] Jun 17 10:44:54 nbp8-mds1 kernel: [1733830.842295] [] tgt_enqueue+0x62/0x210 [ptlrpc] Jun 17 10:44:54 nbp8-mds1 kernel: [1733830.842337] [] tgt_request_handle+0xada/0x1570 [ptlrpc] Jun 17 10:44:54 nbp8-mds1 kernel: [1733830.842371] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 17 10:44:54 nbp8-mds1 kernel: [1733830.842404] [] ptlrpc_main+0xb34/0x1470 [ptlrpc] Jun 17 10:44:54 nbp8-mds1 kernel: [1733830.842409] [] kthread+0xd1/0xe0 Jun 17 10:44:54 nbp8-mds1 kernel: [1733830.842414] [] ret_from_fork_nospec_end+0x0/0x39 Jun 17 10:44:54 nbp8-mds1 kernel: [1733830.842438] [] 0xffffffffffffffff Jun 17 10:44:54 nbp8-mds1 kernel: [1733830.842441] LustreError: dumping log to /tmp/lustre-log.1592415894.8583 Jun 17 10:44:54 nbp8-mds1 sec[2849]: SEC_EVENT |msg lustre hung thread Jun 17 10:44:55 nbp8-mds1 kernel: [1733831.659688] LNet: Service thread pid 10531 was inactive for 551.86s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jun 17 10:44:55 nbp8-mds1 kernel: [1733831.716060] Pid: 10531, comm: mdt01_071 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020 Jun 17 10:44:55 nbp8-mds1 kernel: [1733831.716061] Call Trace: Jun 17 10:44:55 nbp8-mds1 kernel: [1733831.716074] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] Jun 17 10:44:55 nbp8-mds1 kernel: [1733831.739386] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] Jun 17 10:44:55 nbp8-mds1 kernel: [1733831.739412] [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jun 17 10:44:55 nbp8-mds1 kernel: [1733831.739423] [] mdt_object_lock_internal+0x70/0x360 [mdt] Jun 17 10:44:55 nbp8-mds1 kernel: [1733831.739433] [] mdt_object_lock_try+0x27/0xb0 [mdt] Jun 17 10:44:55 nbp8-mds1 kernel: [1733831.739442] [] mdt_getattr_name_lock+0x1277/0x1c30 [mdt] Jun 17 10:44:55 nbp8-mds1 kernel: [1733831.739453] [] mdt_intent_getattr+0x2b5/0x480 [mdt] Jun 17 10:44:55 nbp8-mds1 kernel: [1733831.739463] [] mdt_intent_policy+0x435/0xd80 [mdt] Jun 17 10:44:55 nbp8-mds1 kernel: [1733831.739489] [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] Jun 17 10:44:55 nbp8-mds1 kernel: [1733831.739519] [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] Jun 17 10:44:55 nbp8-mds1 kernel: [1733831.739575] [] tgt_enqueue+0x62/0x210 [ptlrpc] Jun 17 10:44:55 nbp8-mds1 kernel: [1733831.739619] [] tgt_request_handle+0xada/0x1570 [ptlrpc] Jun 17 10:44:55 nbp8-mds1 kernel: [1733831.739656] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 17 10:44:55 nbp8-mds1 kernel: [1733831.739691] [] ptlrpc_main+0xb34/0x1470 [ptlrpc] Jun 17 10:44:55 nbp8-mds1 kernel: [1733831.739700] [] kthread+0xd1/0xe0 Jun 17 10:44:55 nbp8-mds1 kernel: [1733831.739708] [] ret_from_fork_nospec_end+0x0/0x39 Jun 17 10:44:55 nbp8-mds1 kernel: [1733831.739737] [] 0xffffffffffffffff Jun 17 10:44:55 nbp8-mds1 kernel: [1733831.739742] Pid: 14089, comm: mdt01_118 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020 Jun 17 10:44:55 nbp8-mds1 kernel: [1733831.739746] Call Trace: Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.739777] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.739804] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.739816] [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.739826] [] mdt_object_lock_internal+0x70/0x360 [mdt] Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.739836] [] mdt_object_lock_try+0x27/0xb0 [mdt] Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.739847] [] mdt_getattr_name_lock+0x1277/0x1c30 [mdt] Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.739858] [] mdt_intent_getattr+0x2b5/0x480 [mdt] Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.739868] [] mdt_intent_policy+0x435/0xd80 [mdt] Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.739893] [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.739922] [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.739961] [] tgt_enqueue+0x62/0x210 [ptlrpc] Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.739998] [] tgt_request_handle+0xada/0x1570 [ptlrpc] Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.740031] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.740063] [] ptlrpc_main+0xb34/0x1470 [ptlrpc] Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.740067] [] kthread+0xd1/0xe0 Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.740069] [] ret_from_fork_nospec_end+0x0/0x39 Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.740073] [] 0xffffffffffffffff Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.740077] Pid: 12642, comm: mdt01_091 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020 Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.740077] Call Trace: Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.740108] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.740136] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.740148] [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.740158] [] mdt_object_lock_internal+0x70/0x360 [mdt] Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.740167] [] mdt_object_lock_try+0x27/0xb0 [mdt] Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.740177] [] mdt_getattr_name_lock+0x1277/0x1c30 [mdt] Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.740188] [] mdt_intent_getattr+0x2b5/0x480 [mdt] Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.740200] [] mdt_intent_policy+0x435/0xd80 [mdt] Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.740224] [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.740253] [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.740291] [] tgt_enqueue+0x62/0x210 [ptlrpc] Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.740328] [] tgt_request_handle+0xada/0x1570 [ptlrpc] Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.740361] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.740392] [] ptlrpc_main+0xb34/0x1470 [ptlrpc] Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.740395] [] kthread+0xd1/0xe0 Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.740399] [] ret_from_fork_nospec_end+0x0/0x39 Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.740402] [] 0xffffffffffffffff Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.740405] Pid: 8616, comm: mdt01_066 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020 Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.740406] Call Trace: Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.740436] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.740464] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.740476] [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.740485] [] mdt_object_lock_internal+0x70/0x360 [mdt] Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.740495] [] mdt_object_lock_try+0x27/0xb0 [mdt] Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.740505] [] mdt_getattr_name_lock+0x1277/0x1c30 [mdt] Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.740517] [] mdt_intent_getattr+0x2b5/0x480 [mdt] Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.740531] [] mdt_intent_policy+0x435/0xd80 [mdt] Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.740555] [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.740584] [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.740622] [] tgt_enqueue+0x62/0x210 [ptlrpc] Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.740658] [] tgt_request_handle+0xada/0x1570 [ptlrpc] Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.740692] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.740722] [] ptlrpc_main+0xb34/0x1470 [ptlrpc] Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.740725] [] kthread+0xd1/0xe0 Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.740727] [] ret_from_fork_nospec_end+0x0/0x39 Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.740733] [] 0xffffffffffffffff Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.740737] LNet: Service thread pid 12627 was inactive for 551.95s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Jun 17 10:44:56 nbp8-mds1 kernel: [1733831.740739] LNet: Skipped 16 previous similar messages Jun 17 10:44:56 nbp8-mds1 kernel: [1733832.811609] LNet: Service thread pid 12622 was inactive for 551.96s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Jun 17 10:44:56 nbp8-mds1 kernel: [1733832.811611] LNet: Skipped 54 previous similar messages Jun 17 10:44:56 nbp8-mds1 kernel: [1733832.811614] LustreError: dumping log to /tmp/lustre-log.1592415896.12622 Jun 17 10:44:56 nbp8-mds1 sec[2849]: SEC_EVENT |msg lustre hung thread Jun 17 10:45:13 nbp8-mds1 kernel: [1733849.709090] Lustre: nbp8-MDT0000: haven't heard from client 945826d6-1cc2-c138-9c19-be2127c9e8c7 (at 10.151.23.132@o2ib) in 215 seconds. I think it's dead, and I am evicting it. exp ffff898844f47c00, cur 1592415913 expire 1592415763 last 1592415698 Jun 17 10:45:13 nbp8-mds1 kernel: [1733849.782064] Lustre: Skipped 10 previous similar messages Jun 17 10:45:13 nbp8-mds1 sec[2849]: Evaluating code '10 > 1500' and setting variable '%num' Jun 17 10:45:13 nbp8-mds1 sec[2849]: Variable '%num' set to '' Jun 17 10:45:26 nbp8-mds1 kernel: [1733862.822678] Lustre: 14060:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (30/30), not sending early reply Jun 17 10:45:26 nbp8-mds1 kernel: [1733862.822678] req@ffff8991e5774800 x1668974497630960/t0(0) o101->40911b12-9f8c-db4a-a3f4-9e466fb4fcfc@10.151.1.216@o2ib:156/0 lens 576/0 e 1 to 0 dl 1592415956 ref 2 fl New:/0/ffffffff rc 0/-1 Jun 17 10:45:26 nbp8-mds1 kernel: [1733862.919383] Lustre: 14060:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1036 previous similar messages Jun 17 10:46:05 nbp8-mds1 kernel: [1733901.873070] Lustre: nbp8-MDT0000: Client 0ba6a333-f5c0-5356-5aec-6f99b1acdebd (at 10.149.7.39@o2ib313) reconnecting Jun 17 10:46:05 nbp8-mds1 kernel: [1733901.907994] Lustre: Skipped 61 previous similar messages Jun 17 10:46:05 nbp8-mds1 kernel: [1733901.926080] Lustre: nbp8-MDT0000: Connection restored to cdf18a0f-9a51-219f-ba03-6b67c78c4853 (at 10.149.7.39@o2ib313) Jun 17 10:46:05 nbp8-mds1 kernel: [1733901.926087] Lustre: Skipped 47 previous similar messages Jun 17 10:46:29 nbp8-mds1 kernel: [1733925.710333] Lustre: nbp8-MDT0000: haven't heard from client fae4f8e2-306d-e882-eb5d-07df5dc200b8 (at 10.151.37.155@o2ib) in 225 seconds. I think it's dead, and I am evicting it. exp ffff8986343c4400, cur 1592415989 expire 1592415839 last 1592415764 Jun 17 10:46:29 nbp8-mds1 kernel: [1733925.783298] Lustre: Skipped 2 previous similar messages Jun 17 10:46:29 nbp8-mds1 sec[2849]: Evaluating code '2 > 1500' and setting variable '%num' Jun 17 10:46:29 nbp8-mds1 sec[2849]: Variable '%num' set to '' Jun 17 10:46:43 nbp8-mds1 kernel: [1733939.745110] Lustre: nbp8-MDT0000: Client 03ce6a1a-6a6b-2649-dd95-a57298fc7fda (at 10.151.6.36@o2ib) reconnecting Jun 17 10:46:43 nbp8-mds1 kernel: [1733939.779171] Lustre: Skipped 160 previous similar messages Jun 17 10:47:20 nbp8-mds1 kernel: [1733977.470727] Lustre: MGS: Connection restored to 06ca9284-da8c-feaf-8520-b650ed3d379a (at 10.151.8.35@o2ib) Jun 17 10:47:20 nbp8-mds1 kernel: [1733977.470733] Lustre: Skipped 254 previous similar messages Jun 17 10:47:45 nbp8-mds1 kernel: [1734001.712802] Lustre: nbp8-MDT0000: haven't heard from client d8f25440-1dfb-1352-9765-08ded09334aa (at 10.151.37.61@o2ib) in 157 seconds. I think it's dead, and I am evicting it. exp ffff8986a620dc00, cur 1592416065 expire 1592415915 last 1592415908 Jun 17 10:47:45 nbp8-mds1 kernel: [1734001.785511] Lustre: Skipped 13 previous similar messages Jun 17 10:47:45 nbp8-mds1 sec[2849]: Evaluating code '13 > 1500' and setting variable '%num' Jun 17 10:47:45 nbp8-mds1 sec[2849]: Variable '%num' set to '' Jun 17 10:48:00 nbp8-mds1 kernel: [1734016.761994] Lustre: nbp8-MDT0000: Client eac32f30-d800-3921-5c6f-9013827f1450 (at 10.151.37.27@o2ib) reconnecting Jun 17 10:48:00 nbp8-mds1 kernel: [1734016.796351] Lustre: Skipped 99 previous similar messages Jun 17 10:49:01 nbp8-mds1 kernel: [1734077.718774] Lustre: nbp8-MDT0000: haven't heard from client 62d3fc4d-d60c-4dbf-ae7d-d583c5c419e1 (at 10.151.34.52@o2ib) in 192 seconds. I think it's dead, and I am evicting it. exp ffff89858dee5000, cur 1592416141 expire 1592415991 last 1592415949 Jun 17 10:49:28 nbp8-mds1 kernel: [1734104.795516] LustreError: 7286:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1592415343, 825s ago); not entering recovery in server code, just going back to sleep ns: mdt-nbp8-MDT0000_UUID lock: ffff89918b689f80/0xa22cee4044517cdc lrc: 3/1,0 mode: --/PR res: [0x3608b98c3:0x4:0x0].0x0 bits 0x13/0x8 rrc: 334 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 7286 timeout: 0 lvb_type: 0 Jun 17 10:49:28 nbp8-mds1 kernel: [1734104.925683] LustreError: 7286:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 19 previous similar messages Jun 17 10:49:28 nbp8-mds1 kernel: [1734105.458542] LustreError: 8631:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1592415343, 825s ago); not entering recovery in server code, just going back to sleep ns: mdt-nbp8-MDT0000_UUID lock: ffff898a93669f80/0xa22cee4044522c37 lrc: 3/1,0 mode: --/PR res: [0x3608b98c3:0x4:0x0].0x0 bits 0x13/0x8 rrc: 334 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 8631 timeout: 0 lvb_type: 0 Jun 17 10:49:28 nbp8-mds1 kernel: [1734105.588750] LustreError: 8631:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 28 previous similar messages Jun 17 10:49:51 nbp8-mds1 kernel: [1734128.009849] Lustre: nbp8-MDT0000: Connection restored to 729ebfc5-544c-a648-aaf1-77306e42f8c5 (at 10.151.36.128@o2ib) Jun 17 10:49:51 nbp8-mds1 kernel: [1734128.009854] Lustre: Skipped 239 previous similar messages Jun 17 10:50:17 nbp8-mds1 kernel: [1734153.719568] Lustre: nbp8-MDT0000: haven't heard from client edbfca03-3409-4d90-fedf-37f430169f98 (at 10.149.1.79@o2ib313) in 209 seconds. I think it's dead, and I am evicting it. exp ffff898d42289400, cur 1592416217 expire 1592416067 last 1592416008 Jun 17 10:50:28 nbp8-mds1 kernel: [1734165.471751] Lustre: 7282:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (30/30), not sending early reply Jun 17 10:50:28 nbp8-mds1 kernel: [1734165.471751] req@ffff8993303d8000 x1669041722258448/t0(0) o39->329cb904-1e19-8b4c-cf6a-6660ff47c5dc@10.151.6.226@o2ib:458/0 lens 224/0 e 0 to 0 dl 1592416258 ref 2 fl New:/0/ffffffff rc 0/-1 Jun 17 10:50:28 nbp8-mds1 kernel: [1734165.567891] Lustre: 7282:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 100 previous similar messages Jun 17 10:50:31 nbp8-mds1 kernel: [1734168.369067] Lustre: nbp8-MDT0000: Client 40911b12-9f8c-db4a-a3f4-9e466fb4fcfc (at 10.151.1.216@o2ib) reconnecting Jun 17 10:50:31 nbp8-mds1 kernel: [1734168.403417] Lustre: Skipped 150 previous similar messages Jun 17 10:52:33 nbp8-mds1 kernel: [1734290.153199] LustreError: 8077:0:(service.c:3361:ptlrpc_svcpt_health_check()) mdt: unhealthy - request has been waiting 718s Jun 17 10:52:51 nbp8-mds1 kernel: [1734307.739638] Lustre: nbp8-MDT0000: haven't heard from client 214f0134-faba-649a-f177-0a63ec98ae59 (at 10.151.34.143@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89985c23a000, cur 1592416371 expire 1592416221 last 1592416144 Jun 17 10:52:51 nbp8-mds1 kernel: [1734307.812609] Lustre: Skipped 10 previous similar messages Jun 17 10:52:51 nbp8-mds1 sec[2849]: Evaluating code '10 > 1500' and setting variable '%num' Jun 17 10:52:51 nbp8-mds1 sec[2849]: Variable '%num' set to '' Jun 17 10:54:54 nbp8-mds1 kernel: [1734431.009759] Lustre: nbp8-MDT0000: Connection restored to 33d2242f-c131-e51a-3f9f-11cf79808631 (at 10.149.15.108@o2ib313) Jun 17 10:54:54 nbp8-mds1 kernel: [1734431.009764] Lustre: Skipped 95 previous similar messages Jun 17 10:55:33 nbp8-mds1 kernel: [1734470.391532] LustreError: 8835:0:(service.c:3361:ptlrpc_svcpt_health_check()) mdt: unhealthy - request has been waiting 898s Jun 17 10:55:34 nbp8-mds1 kernel: [1734471.271685] Lustre: nbp8-MDT0000: Client 28a4f7e2-f02c-23ec-27be-833bf3eeab69 (at 10.151.5.25@o2ib) reconnecting Jun 17 10:55:34 nbp8-mds1 kernel: [1734471.305755] Lustre: Skipped 35 previous similar messages Jun 17 11:00:23 nbp8-mds1 kernel: [1734759.740985] Lustre: nbp8-MDT0000: haven't heard from client 9c63e84a-7b49-4fb8-ed0b-0627921b3a3d (at 10.149.9.91@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8990077c6c00, cur 1592416823 expire 1592416673 last 1592416596 Jun 17 11:00:23 nbp8-mds1 kernel: [1734759.814489] Lustre: Skipped 14 previous similar messages Jun 17 11:00:23 nbp8-mds1 sec[2849]: Evaluating code '14 > 1500' and setting variable '%num' Jun 17 11:00:23 nbp8-mds1 sec[2849]: Variable '%num' set to '' Jun 17 11:00:33 nbp8-mds1 kernel: [1734769.761857] Lustre: 14060:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (30/30), not sending early reply Jun 17 11:00:33 nbp8-mds1 kernel: [1734769.761857] req@ffff8991b4e39200 x1669115549137312/t0(0) o101->e184e0fb-473c-03a2-5ac3-d3780676a78b@10.149.11.215@o2ib313:308/0 lens 576/0 e 0 to 0 dl 1592416863 ref 2 fl New:/0/ffffffff rc 0/-1 Jun 17 11:00:33 nbp8-mds1 kernel: [1734769.859706] Lustre: 14060:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1944 previous similar messages Jun 17 11:04:58 nbp8-mds1 kernel: [1735034.916876] Lustre: nbp8-MDT0000: Connection restored to 1c02c475-14e6-44be-d6c6-739117dbd963 (at 10.151.59.213@o2ib) Jun 17 11:04:58 nbp8-mds1 kernel: [1735034.916881] Lustre: Skipped 619 previous similar messages Jun 17 11:05:38 nbp8-mds1 kernel: [1735075.538183] Lustre: nbp8-MDT0000: Client e184e0fb-473c-03a2-5ac3-d3780676a78b (at 10.149.11.215@o2ib313) reconnecting Jun 17 11:05:38 nbp8-mds1 kernel: [1735075.573674] Lustre: Skipped 540 previous similar messages Jun 17 11:05:41 nbp8-mds1 kernel: [1735078.042090] LNet: Service thread pid 14081 was inactive for 550.73s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jun 17 11:05:41 nbp8-mds1 kernel: [1735078.098472] LNet: Skipped 3 previous similar messages Jun 17 11:05:41 nbp8-mds1 kernel: [1735078.115673] Pid: 14081, comm: mdt00_086 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020 Jun 17 11:05:41 nbp8-mds1 kernel: [1735078.115677] Call Trace: Jun 17 11:05:41 nbp8-mds1 kernel: [1735078.115691] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] Jun 17 11:05:41 nbp8-mds1 kernel: [1735078.120966] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] Jun 17 11:05:41 nbp8-mds1 kernel: [1735078.120988] [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jun 17 11:05:41 nbp8-mds1 kernel: [1735078.120999] [] mdt_object_lock_internal+0x70/0x360 [mdt] Jun 17 11:05:41 nbp8-mds1 kernel: [1735078.121009] [] mdt_object_lock_try+0x27/0xb0 [mdt] Jun 17 11:05:41 nbp8-mds1 kernel: [1735078.121018] [] mdt_getattr_name_lock+0x1277/0x1c30 [mdt] Jun 17 11:05:41 nbp8-mds1 kernel: [1735078.121029] [] mdt_intent_getattr+0x2b5/0x480 [mdt] Jun 17 11:05:41 nbp8-mds1 kernel: [1735078.121039] [] mdt_intent_policy+0x435/0xd80 [mdt] Jun 17 11:05:41 nbp8-mds1 kernel: [1735078.121064] [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] Jun 17 11:05:41 nbp8-mds1 kernel: [1735078.121094] [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] Jun 17 11:05:41 nbp8-mds1 kernel: [1735078.121146] [] tgt_enqueue+0x62/0x210 [ptlrpc] Jun 17 11:05:41 nbp8-mds1 kernel: [1735078.121186] [] tgt_request_handle+0xada/0x1570 [ptlrpc] Jun 17 11:05:41 nbp8-mds1 kernel: [1735078.121219] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 17 11:05:41 nbp8-mds1 kernel: [1735078.121250] [] ptlrpc_main+0xb34/0x1470 [ptlrpc] Jun 17 11:05:41 nbp8-mds1 kernel: [1735078.121256] [] kthread+0xd1/0xe0 Jun 17 11:05:41 nbp8-mds1 kernel: [1735078.121259] [] ret_from_fork_nospec_end+0x0/0x39 Jun 17 11:05:41 nbp8-mds1 kernel: [1735078.121283] [] 0xffffffffffffffff Jun 17 11:05:41 nbp8-mds1 kernel: [1735078.121286] LustreError: dumping log to /tmp/lustre-log.1592417141.14081 Jun 17 11:05:41 nbp8-mds1 sec[2849]: SEC_EVENT |msg lustre hung thread Jun 17 11:05:42 nbp8-mds1 kernel: [1735078.980762] LNet: Service thread pid 8530 was inactive for 551.67s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jun 17 11:05:42 nbp8-mds1 kernel: [1735079.036848] Pid: 8530, comm: mdt00_022 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020 Jun 17 11:05:42 nbp8-mds1 kernel: [1735079.036850] Call Trace: Jun 17 11:05:42 nbp8-mds1 kernel: [1735079.036863] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] Jun 17 11:05:42 nbp8-mds1 kernel: [1735079.060163] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] Jun 17 11:05:42 nbp8-mds1 kernel: [1735079.060191] [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jun 17 11:05:42 nbp8-mds1 kernel: [1735079.060202] [] mdt_object_lock_internal+0x70/0x360 [mdt] Jun 17 11:05:42 nbp8-mds1 kernel: [1735079.060212] [] mdt_object_lock_try+0x27/0xb0 [mdt] Jun 17 11:05:42 nbp8-mds1 kernel: [1735079.060222] [] mdt_getattr_name_lock+0x1277/0x1c30 [mdt] Jun 17 11:05:42 nbp8-mds1 kernel: [1735079.060232] [] mdt_intent_getattr+0x2b5/0x480 [mdt] Jun 17 11:05:42 nbp8-mds1 kernel: [1735079.060242] [] mdt_intent_policy+0x435/0xd80 [mdt] Jun 17 11:05:42 nbp8-mds1 kernel: [1735079.060267] [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] Jun 17 11:05:42 nbp8-mds1 kernel: [1735079.060299] [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] Jun 17 11:05:42 nbp8-mds1 kernel: [1735079.060349] [] tgt_enqueue+0x62/0x210 [ptlrpc] Jun 17 11:05:42 nbp8-mds1 kernel: [1735079.060389] [] tgt_request_handle+0xada/0x1570 [ptlrpc] Jun 17 11:05:42 nbp8-mds1 kernel: [1735079.060422] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 17 11:05:42 nbp8-mds1 kernel: [1735079.060454] [] ptlrpc_main+0xb34/0x1470 [ptlrpc] Jun 17 11:05:42 nbp8-mds1 kernel: [1735079.060458] [] kthread+0xd1/0xe0 Jun 17 11:05:42 nbp8-mds1 kernel: [1735079.060462] [] ret_from_fork_nospec_end+0x0/0x39 Jun 17 11:05:42 nbp8-mds1 kernel: [1735079.060487] [] 0xffffffffffffffff Jun 17 11:05:42 nbp8-mds1 sec[2849]: SEC_EVENT |msg lustre hung thread Jun 17 11:07:33 nbp8-mds1 kernel: [1735190.222015] LustreError: 10113:0:(service.c:3361:ptlrpc_svcpt_health_check()) mdt: unhealthy - request has been waiting 602s Jun 17 11:10:15 nbp8-mds1 kernel: [1735352.313155] LustreError: 8530:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1592416590, 825s ago); not entering recovery in server code, just going back to sleep ns: mdt-nbp8-MDT0000_UUID lock: ffff897cc4a15c40/0xa22cee40458d1a86 lrc: 3/1,0 mode: --/PR res: [0x3608b98c3:0x4:0x0].0x0 bits 0x13/0x8 rrc: 339 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 8530 timeout: 0 lvb_type: 0 Jun 17 11:10:15 nbp8-mds1 kernel: [1735352.443341] LustreError: 8530:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 104 previous similar messages Jun 17 11:10:33 nbp8-mds1 kernel: [1735370.456666] LustreError: 10375:0:(service.c:3361:ptlrpc_svcpt_health_check()) mdt: unhealthy - request has been waiting 782s Jun 17 11:10:38 nbp8-mds1 kernel: [1735375.028012] Lustre: 14060:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (30/30), not sending early reply Jun 17 11:10:38 nbp8-mds1 kernel: [1735375.028012] req@ffff89942969b180 x1668977129372752/t0(0) o101->1a05bdd0-fad2-f66f-f29d-1d009da267a4@10.151.16.154@o2ib:158/0 lens 576/0 e 2 to 0 dl 1592417468 ref 2 fl New:/2/ffffffff rc 0/-1 Jun 17 11:10:38 nbp8-mds1 kernel: [1735375.125006] Lustre: 14060:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1926 previous similar messages Jun 17 11:11:03 nbp8-mds1 rsyslogd: -- MARK -- Jun 17 11:11:39 nbp8-mds1 kernel: [1735435.769792] Lustre: nbp8-MDT0000: haven't heard from client a0eebb83-f273-e9ca-f04a-205407bdf5bd (at 10.151.32.100@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a2e83a3400, cur 1592417499 expire 1592417349 last 1592417272 Jun 17 11:11:39 nbp8-mds1 kernel: [1735435.842770] Lustre: Skipped 44 previous similar messages Jun 17 11:11:39 nbp8-mds1 sec[2849]: Evaluating code '44 > 1500' and setting variable '%num' Jun 17 11:11:39 nbp8-mds1 sec[2849]: Variable '%num' set to '' Jun 17 11:13:33 nbp8-mds1 kernel: [1735550.695233] LustreError: 10630:0:(service.c:3361:ptlrpc_svcpt_health_check()) mdt: unhealthy - request has been waiting 962s Jun 17 11:14:59 nbp8-mds1 kernel: [1735635.943043] Lustre: nbp8-MDT0000: Connection restored to 9e00432c-6d8f-8557-8f0b-1596b413df2b (at 10.149.12.97@o2ib313) Jun 17 11:14:59 nbp8-mds1 kernel: [1735635.943048] Lustre: Skipped 658 previous similar messages Jun 17 11:15:43 nbp8-mds1 kernel: [1735680.485715] Lustre: nbp8-MDT0000: Client 1a05bdd0-fad2-f66f-f29d-1d009da267a4 (at 10.151.16.154@o2ib) reconnecting Jun 17 11:15:43 nbp8-mds1 kernel: [1735680.520383] Lustre: Skipped 571 previous similar messages Jun 17 11:16:34 nbp8-mds1 kernel: [1735730.929726] LustreError: 10876:0:(service.c:3361:ptlrpc_svcpt_health_check()) mdt: unhealthy - request has been waiting 1142s Jun 17 11:19:33 nbp8-mds1 kernel: [1735910.172387] LustreError: 11244:0:(service.c:3361:ptlrpc_svcpt_health_check()) mdt: unhealthy - request has been waiting 1322s Jun 17 11:20:43 nbp8-mds1 kernel: [1735980.296157] Lustre: 14060:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (30/30), not sending early reply Jun 17 11:20:43 nbp8-mds1 kernel: [1735980.296157] req@ffff8992c9e11b00 x1669343847086976/t0(0) o101->8bc31fdc-e04b-e241-4399-c121f5caff47@10.141.6.183@o2ib417:8/0 lens 584/0 e 1 to 0 dl 1592418073 ref 2 fl New:/2/ffffffff rc 0/-1 Jun 17 11:20:43 nbp8-mds1 kernel: [1735980.393160] Lustre: 14060:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1218 previous similar messages Jun 17 11:22:33 nbp8-mds1 kernel: [1736090.410842] LustreError: 11510:0:(service.c:3361:ptlrpc_svcpt_health_check()) mdt: unhealthy - request has been waiting 1502s Jun 17 11:25:03 nbp8-mds1 kernel: [1736240.235044] Lustre: nbp8-MDT0000: Connection restored to 9bde665a-7b50-d628-3a6d-766a2b0deb49 (at 10.151.15.25@o2ib) Jun 17 11:25:03 nbp8-mds1 kernel: [1736240.235049] Lustre: Skipped 601 previous similar messages Jun 17 11:25:48 nbp8-mds1 kernel: [1736285.041652] Lustre: nbp8-MDT0000: Client 1f465a23-dc5e-ba3f-f1db-394df16ee6c0 (at 10.141.6.158@o2ib417) reconnecting Jun 17 11:25:48 nbp8-mds1 kernel: [1736285.076875] Lustre: Skipped 471 previous similar messages Jun 17 11:26:28 nbp8-mds1 kernel: [1736325.318737] LNet: Service thread pid 8565 was inactive for 551.51s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jun 17 11:26:28 nbp8-mds1 kernel: [1736325.374836] Pid: 8565, comm: mdt00_039 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020 Jun 17 11:26:28 nbp8-mds1 kernel: [1736325.374838] Call Trace: Jun 17 11:26:28 nbp8-mds1 kernel: [1736325.374852] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] Jun 17 11:26:28 nbp8-mds1 kernel: [1736325.398161] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] Jun 17 11:26:28 nbp8-mds1 kernel: [1736325.398181] [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jun 17 11:26:28 nbp8-mds1 kernel: [1736325.398191] [] mdt_object_lock_internal+0x70/0x360 [mdt] Jun 17 11:26:28 nbp8-mds1 kernel: [1736325.398201] [] mdt_object_lock_try+0x27/0xb0 [mdt] Jun 17 11:26:28 nbp8-mds1 kernel: [1736325.398210] [] mdt_getattr_name_lock+0x1277/0x1c30 [mdt] Jun 17 11:26:28 nbp8-mds1 kernel: [1736325.398221] [] mdt_intent_getattr+0x2b5/0x480 [mdt] Jun 17 11:26:28 nbp8-mds1 kernel: [1736325.398231] [] mdt_intent_policy+0x435/0xd80 [mdt] Jun 17 11:26:28 nbp8-mds1 kernel: [1736325.398255] [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] Jun 17 11:26:28 nbp8-mds1 kernel: [1736325.398286] [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] Jun 17 11:26:28 nbp8-mds1 kernel: [1736325.398331] [] tgt_enqueue+0x62/0x210 [ptlrpc] Jun 17 11:26:28 nbp8-mds1 kernel: [1736325.398370] [] tgt_request_handle+0xada/0x1570 [ptlrpc] Jun 17 11:26:28 nbp8-mds1 kernel: [1736325.398403] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 17 11:26:28 nbp8-mds1 kernel: [1736325.398434] [] ptlrpc_main+0xb34/0x1470 [ptlrpc] Jun 17 11:26:28 nbp8-mds1 kernel: [1736325.398440] [] kthread+0xd1/0xe0 Jun 17 11:26:28 nbp8-mds1 kernel: [1736325.398445] [] ret_from_fork_nospec_end+0x0/0x39 Jun 17 11:26:28 nbp8-mds1 kernel: [1736325.398470] [] 0xffffffffffffffff Jun 17 11:26:28 nbp8-mds1 kernel: [1736325.398471] LustreError: dumping log to /tmp/lustre-log.1592418388.8565 Jun 17 11:26:29 nbp8-mds1 sec[2849]: SEC_EVENT |msg lustre hung thread Jun 17 11:28:34 nbp8-mds1 kernel: [1736450.848084] LustreError: 12578:0:(service.c:3361:ptlrpc_svcpt_health_check()) mdt: unhealthy - request has been waiting 1862s Jun 17 11:28:34 nbp8-mds1 kernel: [1736450.885877] LustreError: 12578:0:(service.c:3361:ptlrpc_svcpt_health_check()) Skipped 1 previous similar message Jun 17 11:30:43 nbp8-mds1 kernel: [1736580.496134] Lustre: 14060:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (30/-125), not sending early reply Jun 17 11:30:43 nbp8-mds1 kernel: [1736580.496134] req@ffff899aa3a1d580 x1669030209379968/t0(0) o101->29e9480a-ada2-cd39-cdb4-881b0782ae58@10.149.8.106@o2ib313:608/0 lens 576/0 e 0 to 0 dl 1592418673 ref 2 fl New:/0/ffffffff rc 0/-1 Jun 17 11:30:43 nbp8-mds1 kernel: [1736580.594281] Lustre: 14060:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1683 previous similar messages Jun 17 11:31:02 nbp8-mds1 kernel: [1736598.816763] LustreError: 8565:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1592417837, 825s ago); not entering recovery in server code, just going back to sleep ns: mdt-nbp8-MDT0000_UUID lock: ffff89838a43f980/0xa22cee4046a11c08 lrc: 3/1,0 mode: --/PR res: [0x3608b98c3:0x4:0x0].0x0 bits 0x13/0x8 rrc: 342 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 8565 timeout: 0 lvb_type: 0 Jun 17 11:35:03 nbp8-mds1 kernel: [1736840.768134] Lustre: MGS: Connection restored to 02eaa079-4544-1890-7c12-58e638416bfb (at 10.151.59.210@o2ib) Jun 17 11:35:03 nbp8-mds1 kernel: [1736840.768139] Lustre: Skipped 634 previous similar messages Jun 17 11:35:49 nbp8-mds1 kernel: [1736885.794635] Lustre: nbp8-MDT0000: Client 29e9480a-ada2-cd39-cdb4-881b0782ae58 (at 10.149.8.106@o2ib313) reconnecting Jun 17 11:35:49 nbp8-mds1 kernel: [1736885.829853] Lustre: Skipped 579 previous similar messages Jun 17 12:10:06 nbp8-mds1 kernel: [ 0.000000] microcode: microcode updated early to revision 0x714, date = 2018-05-08 Jun 17 12:10:06 nbp8-mds1 kernel: [ 0.000000] Initializing cgroup subsys cpuset Jun 17 12:10:06 nbp8-mds1 kernel: [ 0.000000] Initializing cgroup subsys cpu Jun 17 12:10:06 nbp8-mds1 kernel: [ 0.000000] Initializing cgroup subsys cpuacct Jun 17 12:10:06 nbp8-mds1 kernel: [ 0.000000] Linux version 3.10.0-1062.12.1.el7_lustre2124.x86_64 (root@swbuild1) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC) ) #1 SMP Tue Mar 17 13:32:19 PDT 2020 Jun 17 12:10:06 nbp8-mds1 kernel: [ 0.000000] Command line: BOOT_IMAGE=(tftp)/boot/lustre-2.12.4-200319/vmlinuz-3.10.0-1062.12.1.el7_lustre2124.x86_64 MAC=00:1e:67:65:25:1d ROOTFS=disk IMAGE_PENDING=0 IMAGE=lustre-2.12.4-200319 SLOT=1 console=ttyS0,38400n8 NODETYPE=service NODE_ID=service200 SLOTCOUNT=2 MONITOR_CONSOLE=yes ro root=dhcp intel_idle.max_cstate=1 processor.max_cstate=1 selinux=0 net.ifnames=0 biosdevname=0 numa_balancing=disable elevator=cfq TRANSPORT=udpcast IMAGESERVER=172.27.0.1 TTL=1 MCAST_RDV_ADDR=224.0.0.1 FLAMETHROWER_DIRECTORY_PORTBASE=9000 START_TIMEOUT=30 RECEIVE_TIMEOUT=5 crashkernel=256M predictable_net_names=0 Jun 17 12:10:06 nbp8-mds1 kernel: [ 0.000000] e820: BIOS-provided physical RAM map: Jun 17 12:10:06 nbp8-mds1 kernel: [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x0000000000099bff] usable Jun 17 12:10:06 nbp8-mds1 kernel: [ 0.000000] BIOS-e820: [mem 0x0000000000099c00-0x000000000009ffff] reserved Jun 17 12:10:06 nbp8-mds1 kernel: [ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved Jun 17 12:10:06 nbp8-mds1 kernel: [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bad26fff] usable Jun 17 12:10:06 nbp8-mds1 kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bad27000-0x00000000baf8efff] reserved Jun 17 12:10:06 nbp8-mds1 kernel: [ 0.000000] BIOS-e820: [mem 0x00000000baf8f000-0x00000000bafc4fff] usable Jun 17 12:10:06 nbp8-mds1 kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bafc5000-0x00000000bafd9fff] reserved Jun 17 12:10:06 nbp8-mds1 kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bafda000-0x00000000bb3d3fff] usable Jun 17 12:10:06 nbp8-mds1 kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bb3d4000-0x00000000bdd2efff] reserved Jun 17 12:10:06 nbp8-mds1 kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bdd2f000-0x00000000bddccfff] ACPI NVS Jun 17 12:10:06 nbp8-mds1 kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bddcd000-0x00000000bdea0fff] ACPI data Jun 17 12:10:06 nbp8-mds1 kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bdea1000-0x00000000bdf2efff] ACPI NVS Jun 17 12:10:06 nbp8-mds1 kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bdf2f000-0x00000000bdfa9fff] ACPI data Jun 17 12:10:06 nbp8-mds1 kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bdfaa000-0x00000000bdffffff] usable Jun 17 12:10:06 nbp8-mds1 kernel: [ 0.000000] BIOS-e820: [mem 0x00000000be000000-0x00000000cfffffff] reserved Jun 17 12:10:06 nbp8-mds1 kernel: [ 0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved