Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
Lustre 2.5.3
-
3
-
9223372036854775807
Description
On all of our filesystems, the following error message is extremely common:
LustreError: 8746:0:(ost_handler.c:1776:ost_blocking_ast()) Error -2 syncing data on lock cancel
There is nothing else in the logs that gives any hint as to why this message is appearing.
Our filesystems all use osd-zfs, and we are currently running Lustre 2.5.3-5chaos (see github.com/chaos/lustre).
If this is a symptom of a bug, then please fix it. If this is not a symptom of a bug, then please stop scaring our system administrators with this message.
Attachments
Issue Links
- is duplicated by
-
LU-7007 (ost_handler.c:1779:ost_blocking_ast()) Error -2 syncing data on lock cancel
-
- Resolved
-
- is related to
-
LU-7308 LustreError: 16956:0:(ost_handler.c:1764:ost_blocking_ast()) Error -2 syncing data on lock cancel
-
- Resolved
-
- is related to
-
LU-5805 tgt_recov blocked and "waking for gap in transno"
-
- Resolved
-
Oleg and I looked into this issue more closely, and the current patch doesn't really solve the problem, since the race is when the two destroy threads are getting and dropping the DLM lock, and not when the actual destroy is happening. In master, the equivalent function tgt_blocking_ast() already has a check for dt_object_exists() and skips the call into ofd_sync() that generates this message completely.
I think the right fix (for 2.5.x only) is to just skip this message for rc == -ENOENT as is already done in master.