[LU-15686] loopdev: op 0x9:(WRITE_ZEROES) not supported on Lustre / ZFS Created: 24/Mar/22 Updated: 25/Mar/22 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Lukasz Flis | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | e2fsprogs | ||
| Environment: |
Lustre 2.15.0RC2 + ZFS 2.0.7 |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
Dear WC Team, To mitigate high cost of calling ftruncate on ZFS Lustre (call used by fortran programs to extend the file size by certain chunk) - we are using localy mounted loop devices from Lustre. Underlying file is created as a sparse file, initialized as journal-less ext4 with lazy_itable_init=1,stride=32,stripe-width=256 parameters and mounted as temporary job storage. This solution works well with LDISKFS based OST. In the ZFS environment we are seeing a lots of op not supported errors: [1118332.841747] blk_update_request: operation not supported error, dev loop0, sector 457191680 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 0 [1118349.509300] blk_update_request: operation not supported error, dev loop0, sector 457195776 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 0 Except of significant amount of logs we haven't seen application instability so far. This happens regardless of nodiscard,discard mount flags
|
| Comments |
| Comment by Lukasz Flis [ 24/Mar/22 ] |
|
One clarification: problem didn't appear on LDISKFS on 2.12 (there were no ll_fallocate function in 2.12) The problem is likely to be present in LDISKFS and 2.15 as well |
| Comment by Andreas Dilger [ 24/Mar/22 ] |
|
The "discard" mount option for ext4 is not recommended to be used, as it causes a significant performance overhead. As for the errors appearing even without "-o discard", is it possible that the errors are generated during mkfs time? Running newer mke2fs will try to trim the whole device to free flash erase blocks and thin-provisioned storage, and use "write same" to avoid explicitly zeroing the inode table. However, if "lazy_itable_init" is used, then the itable zeroing is deferred to a kernel thread that also tries to zero the inode table blocks with "write same" after the filesystem is mounted. There is an experimental patch to mke2fs that will assume the whole filesystem is "zeroed", which can be used for the case of loopback devices that are on newly created sparse files that are known to contain only zeroes: This patch is included in upstream e2fsprogs v1.46.5, but not yet in the Lustre e2fsprogs-1.46.2-wc4. We have never tested the assume_storage_prezeroed function, but if this version is only being used on the client against a loopback file, and not on the server, it shouldn't cause any problems. |
| Comment by Lukasz Flis [ 25/Mar/22 ] |
|
Hi Andreas, During mount we are supplying -o nodiscard flag which seems to be effective, i.e: [Wed Mar 23 14:24:57 2022] EXT4-fs (loop0): mounted filesystem without journal. Opts: nodiscard I am sure that (looking at the logs where ) the op not supported errors appear after mkfs is completed and fs mounted. This seems to be an effect of using loopmounted ext4 fs.
|