Details
-
Bug
-
Resolution: Won't Fix
-
Blocker
-
None
-
Lustre 2.4.2
-
3
-
15338
Description
Our production MDS systems occasionally get stuck with many service threads stuck in ldlm_completion_ast(). The details were described in LU-4579, but that issue was closed when the patch landed which fixed how timeouts are reported.
When this happens, client access hangs and the MDS appears completely idle.
Attachments
Issue Links
- is related to
-
LU-4579 Timeout system horribly broken
-
- Resolved
-
Was this with lu4584 in the original form?
There's not too much data here, but I observed similar lockups in my testing using your chaos tree initially.
Using http://review.whamcloud.com/#/c/6511/ + the latest form of lu4584 patch has a high chance of eliminating this problem too, I think.