[LU-7640] stuck mdt thread required reboot of mds Created: 08/Jan/16 Updated: 26/Apr/17 Resolved: 04/Feb/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical |
| Reporter: | Mahmoud Hanafi | Assignee: | Zhenyu Xu |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
MDS reported stuck mdt threads and dump stack trace <code> I am attaching /var/log/messages and lustre debug dump. The mds need to be rebooted to clear up the error state. |
| Comments |
| Comment by Zhenyu Xu [ 08/Jan/16 ] |
|
it's dup of |
| Comment by Jay Lan (Inactive) [ 20/Jan/16 ] |
|
How did you think this is a dup of |
| Comment by Zhenyu Xu [ 21/Jan/16 ] |
|
The thread is waiting for a lock get granted or cancelled (ldlm_completion_ast()), and that never happens. And #17853 has fix about ldlm_expired_completion_wait() returning -ETIMEDOUT other than 0, so that the thread won't stuck. |
| Comment by Jay Lan (Inactive) [ 21/Jan/16 ] |
|
Thank you Zhenyu! |
| Comment by Mahmoud Hanafi [ 21/Jan/16 ] |
|
We had a crash after the patch was applied. |
| Comment by Zhenyu Xu [ 22/Jan/16 ] |
|
what's the crash backtrace? |
| Comment by Peter Jones [ 04/Feb/16 ] |
|
duplicate of |
| Comment by John Hammond [ 26/Apr/17 ] |
|
Just to clarify, recent versions of http://review.whamcloud.com/17853 no longer contain the change to ldlm_expired_completion_wait() mentioned above and this should not be considered a duplicate of |