Details
-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
Lustre 2.10.5, Lustre 2.12.1
-
None
-
2
-
9223372036854775807
Description
Update to 2.10.5 now we are seeing periods of mass evictions from servers. On the server we have the following stack trace
Nov 7 11:33:12 nbp8-oss7 kernel: [531465.033253] Pid: 11080, comm: ll_ost01_220 3.10.0-693.21.1.el7.20180508.x86_64.lustre2105 #1 SMP Mon Aug 27 23:04:41 UTC 2018 Nov 7 11:33:12 nbp8-oss7 kernel: [531465.033260] Call Trace: Nov 7 11:33:12 nbp8-oss7 kernel: [531465.033274] [<ffffffffa0c1d0e0>] ptlrpc_set_wait+0x4c0/0x920 [ptlrpc] Nov 7 11:33:12 nbp8-oss7 kernel: [531465.038258] [<ffffffffa0bdae43>] ldlm_run_ast_work+0xd3/0x3a0 [ptlrpc] Nov 7 11:33:12 nbp8-oss7 kernel: [531465.038285] [<ffffffffa0bfbabb>] ldlm_glimpse_locks+0x3b/0x100 [ptlrpc] Nov 7 11:33:12 nbp8-oss7 kernel: [531465.038294] [<ffffffffa10e78a4>] ofd_intent_policy+0x444/0xa40 [ofd] Nov 7 11:33:12 nbp8-oss7 kernel: [531465.038318] [<ffffffffa0bda2ba>] ldlm_lock_enqueue+0x38a/0x980 [ptlrpc] Nov 7 11:33:12 nbp8-oss7 kernel: [531465.038346] [<ffffffffa0c03b53>] ldlm_handle_enqueue0+0x9d3/0x16a0 [ptlrpc] Nov 7 11:33:12 nbp8-oss7 kernel: [531465.038390] [<ffffffffa0c89262>] tgt_enqueue+0x62/0x210 [ptlrpc] Nov 7 11:33:12 nbp8-oss7 kernel: [531465.038425] [<ffffffffa0c8ceca>] tgt_request_handle+0x92a/0x1370 [ptlrpc] Nov 7 11:33:12 nbp8-oss7 kernel: [531465.038455] [<ffffffffa0c354bb>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc] Nov 7 11:33:12 nbp8-oss7 kernel: [531465.038484] [<ffffffffa0c394a2>] ptlrpc_main+0xa92/0x1e40 [ptlrpc] Nov 7 11:33:12 nbp8-oss7 kernel: [531465.038489] [<ffffffff810b1131>] kthread+0xd1/0xe0 Nov 7 11:33:12 nbp8-oss7 kernel: [531465.038492] [<ffffffff816a14f7>] ret_from_fork+0x77/0xb0 Nov 7 11:33:12 nbp8-oss7 kernel: [531465.038512] [<ffffffffffffffff>] 0xffffffffffffffff Nov 7 11:33:12 nbp8-oss7 kernel: [531465.038515] LustreError: dumping log to /tmp/lustre-log.1541619192.11080 Nov 7 11:33:14 nbp8-oss7 kernel: [531467.254898] LNet: Service thread pid 9724 was inactive for 303.19s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Nov 7 11:33:14 nbp8-oss7 kernel: [531467.310852] Pid: 9724, comm: ll_ost01_019 3.10.0-693.21.1.el7.20180508.x86_64.lustre2105 #1 SMP Mon Aug 27 23:04:41 UTC 2018 Nov 7 11:33:14 nbp8-oss7 kernel: [531467.310854] Call Trace: Nov 7 11:33:14 nbp8-oss7 kernel: [531467.310866] [<ffffffffa0c1d0e0>] ptlrpc_set_wait+0x4c0/0x920 [ptlrpc] Nov 7 11:33:14 nbp8-oss7 kernel: [531467.332869] [<ffffffffa0bdae43>] ldlm_run_ast_work+0xd3/0x3a0 [ptlrpc] Nov 7 11:33:14 nbp8-oss7 kernel: [531467.332902] [<ffffffffa0bfbabb>] ldlm_glimpse_locks+0x3b/0x100 [ptlrpc] Nov 7 11:33:14 nbp8-oss7 kernel: [531467.332912] [<ffffffffa10e78a4>] ofd_intent_policy+0x444/0xa40 [ofd] Nov 7 11:33:14 nbp8-oss7 kernel: [531467.332936] [<ffffffffa0bda2ba>] ldlm_lock_enqueue+0x38a/0x980 [ptlrpc] Nov 7 11:33:15 nbp8-oss7 kernel: [531467.332988] [<ffffffffa0c03b53>] ldlm_handle_enqueue0+0x9d3/0x16a0 [ptlrpc] Nov 7 11:33:15 nbp8-oss7 kernel: [531467.333032] [<ffffffffa0c89262>] tgt_enqueue+0x62/0x210 [ptlrpc] Nov 7 11:33:15 nbp8-oss7 kernel: [531467.333067] [<ffffffffa0c8ceca>] tgt_request_handle+0x92a/0x1370 [ptlrpc] Nov 7 11:33:15 nbp8-oss7 kernel: [531467.333099] [<ffffffffa0c354bb>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc] Nov 7 11:33:15 nbp8-oss7 kernel: [531467.333128] [<ffffffffa0c394a2>] ptlrpc_main+0xa92/0x1e40 [ptlrpc] Nov 7 11:33:15 nbp8-oss7 kernel: [531467.333134] [<ffffffff810b1131>] kthread+0xd1/0xe0 Nov 7 11:33:15 nbp8-oss7 kernel: [531467.333137] [<ffffffff816a14f7>] ret_from_fork+0x77/0xb0 Nov 7 11:33:15 nbp8-oss7 kernel: [531467.333158] [<ffffffffffffffff>] 0xffffffffffffffff
will upload to ftp:/uploads/LU11613/lustre-log.1541619192.11080
we didn't have rpctrace or dlmtrace so may not be very useful.
Could be related to https://jira.whamcloud.com/browse/LU-11613