Details
Description
In order to improve sync performance on ZFS based OSDs Lustre must be updated to utilize a ZFS ZIL device. This performance work was originally planned as part of Lustre/ZFS integration but has not yet been completed. I'm opening this issue to track it.
Attachments
Issue Links
- is blocked by
-
LU-4215 Some expected improvements for OUT
-
- Open
-
- is blocking
-
LU-2887 sanity-quota test_12a: slow due to ZFS VMs sharing single disk
-
- Resolved
-
-
LU-7895 zfs metadata performance improvements
-
- Resolved
-
- is related to
-
LU-2716 DNE on ZFS create remote directory suffers from long sync.
-
- Open
-
-
LU-6836 sanity-quota test_4a: Passed grace time 12, 1436542665, 1436542679
-
- Resolved
-
-
LU-10392 LustreError: 82980:0:(fid_handler.c:329:__seq_server_alloc_meta()) srv-lglossy-MDT0002: Allocated super-sequence failed: rc = -115
-
- Resolved
-
-
LU-2085 sanityn test_16 (fsx) ran over its Autotest time
-
- Closed
-
-
LU-7426 DNE3: improve llog format for remote update llog
-
- Open
-
-
LU-14678 ldiskfs fast commit feature
-
- Open
-
- mentioned in
-
Page Loading...
Alex, are you recording logical operations into the ZIL or physical blocks? Isn't it true that logical updates (e.g. set bit X, decrement free count, etc) could be applied in any order? I'd think that any updates that are done to a contended resource will have internal locking at least, so they should be safe to replay in some arbitrary order later. Obviously, this can't handle some cases (e.g. running out of space within an llog file), but that should never happen.
The ZIL records are not meant to span multiple TXGs I think, only to optimize sync operations that happen within a single TXG so that they can commit and reply to the client more quickly. If the parent TXG is committed it should be possible to drop all ZIL records for that TXG without further processing (i.e. ZIL is a writethrough cache for the TXG, not writeback). If the MDT or OST crashes before TXG commit, then the first thing to recover before any other pool update are the ZIL updates, and they will reconstruct the "replied as committed" parts of the incomplete TXG. I'd think we also need COS to handle uncommitted dependent updates within that TXG, but any earlier sync updates should already be in the ZIL before they reply and should not need further processing.
Have I misunderstood what you are implementing?