The transaction behavior between ldiskfs and ZFS is exactly the same today. Multiple RPCs are batched together into a single disk transaction, and are committed to disk every few seconds, or sooner depending on space. Lustre does not make any filesystem modifications until after it has reserved transaction space (in the "declare" phase), and started the transaction handle (which is a refcount on the disk transaction). After the transaction handle is started, all filesystem modifications are atomic and will either be committed together, or lost if the transaction doesn't commit (eg. crash).
The unordered updates within a transaction help improve performance, because they increase concurrency within the filesystem. If we had to hold a huge lock across the whole filesystem for each update, this would hurt performance significantly. Instead, we only hold a lock for each object (eg. llog file, or leaf block of a directory being modified) to ensure the update does not corrupt the data from concurrent changes. Since all of the updates related in a single transaction will commit together, it doesn't matter if they are slightly unordered wrt. each other, as they will not cross a transaction boundary.
As for writethrough of large bulk data to disk, this is already done by ZPL+ZIL usage today, depending on configuration options. For small writes they will go directly to the ZIL, which is good for Lustre because it also can pack small writes directly into the RPC request (16KB today, up to 64KB with my patch https://review.whamcloud.com/36587). For large writes, the data is written to the actual location on disk to avoid double IO of large amounts of data, which would typically overload the ZIL device).
The large write data is written to newly allocated and unused disk blocks (as is all data in a COW filesystem), and the block pointer is written to the ZIL. If the transaction commits, the ZIL record is dropped and the block pointer is already part of the transaction. If the transaction does not commit, but the ZIL record has been written, the ZIL replay will use the data written to the "free" blocks on disk.
Note that ZIL does not necessarily cause all IO to be faster. The ZIL is only written to disk when there is a sync operation. This also requires the filesystem to track the dependency of all updates in memory, so that dependent updates are all written to ZIL, and the filesystem is not left in an inconsistent state after a crash and ZIL recovery. This is where the complexity arises in Lustre if the ZIL records for one RPC are written independently from another (not directly related) RPC.
In some cases, two RPCs are not actually dependent on each other, but may happen to share disk blocks (eg. in the changelog). If we have to write everything that modified the ChangeLog to the ZIL, then every sync will write everything to disk, and we are not further ahead than without the ZIL. My alternate proposal (which nobody is currently working on) is to log the RPCs as "logical" journal records, rather than the "physical" block records that exist today. This would make them incompatible with the ZPL+ZIL records, but should avoid the problems that were seen in the previous patch.
This idea makes a lot of sense to me. It would naturally fit with the way the ZIL currently works which would be ideal. As it turns out, adding new ZIL log records is something we've been recently looking in to in order to handle for the following proposed changes.
https://github.com/zfsonlinux/zfs/pull/9414 - renameat(2) flags RENAME_*
https://github.com/zfsonlinux/zfs/pull/9078 - xattr=sa syncing to ZIL
There's an existing proposal to add support for new ZIL record types (currently not being worked on).
https://github.com/zfsonlinux/zfs/pull/9078#issuecomment-553733379
This would lay the ground work for us to be able to register a Lustre-specific ZIL feature flag. Ideally, that could be implemented in such a way that the any new Lustre ZIL records are passed to Lustre for processing. This would let us avoid duplicating Lustre specific logic in the ZFS code. We'd then want to refuse any read-write ZPL mount when there was a ZIL to replay.