[LU-15686] loopdev: op 0x9:(WRITE_ZEROES) not supported on Lustre / ZFS Created: 24/Mar/22  Updated: 25/Mar/22

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Lukasz Flis Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: e2fsprogs
Environment:

Lustre 2.15.0RC2 + ZFS 2.0.7


Issue Links:
Related
is related to LU-15607 update e2fsprogs to 1.46.5 Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Dear WC Team,

To mitigate high cost of calling ftruncate on ZFS Lustre (call used by fortran programs to extend the file size by certain chunk) - we are using localy mounted loop devices from Lustre.

Underlying file is created as a sparse file, initialized as journal-less ext4 with lazy_itable_init=1,stride=32,stripe-width=256 parameters and mounted as temporary job storage.

This solution works well with LDISKFS based OST. In the ZFS environment we are seeing a lots of op not supported errors:

[1118332.841747] blk_update_request: operation not supported error, dev loop0, sector 457191680 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 0
[1118349.509300] blk_update_request: operation not supported error, dev loop0, sector 457195776 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 0

Except of significant amount of logs we haven't seen application instability so far.

This happens regardless of nodiscard,discard mount flags

 

 



 Comments   
Comment by Lukasz Flis [ 24/Mar/22 ]

One clarification:  problem didn't appear on LDISKFS on 2.12 (there were no ll_fallocate function in 2.12)

The problem is likely to be  present in LDISKFS and 2.15 as well

Comment by Andreas Dilger [ 24/Mar/22 ]

The "discard" mount option for ext4 is not recommended to be used, as it causes a significant performance overhead.

As for the errors appearing even without "-o discard", is it possible that the errors are generated during mkfs time? Running newer mke2fs will try to trim the whole device to free flash erase blocks and thin-provisioned storage, and use "write same" to avoid explicitly zeroing the inode table. However, if "lazy_itable_init" is used, then the itable zeroing is deferred to a kernel thread that also tries to zero the inode table blocks with "write same" after the filesystem is mounted.

There is an experimental patch to mke2fs that will assume the whole filesystem is "zeroed", which can be used for the case of loopback devices that are on newly created sparse files that are known to contain only zeroes:
https://patchwork.ozlabs.org/project/linux-ext4/patch/20210921034203.323950-1-sarthakkukreti@google.com/

This patch is included in upstream e2fsprogs v1.46.5, but not yet in the Lustre e2fsprogs-1.46.2-wc4. We have never tested the assume_storage_prezeroed function, but if this version is only being used on the client against a loopback file, and not on the server, it shouldn't cause any problems.

Comment by Lukasz Flis [ 25/Mar/22 ]

Hi Andreas,

During mount we are supplying -o nodiscard flag which seems to be effective, i.e:

[Wed Mar 23 14:24:57 2022] EXT4-fs (loop0): mounted filesystem without journal. Opts: nodiscard

I am sure that (looking at the logs where ) the op not supported  errors appear after mkfs is completed and fs mounted. This seems to be an effect of using loopmounted ext4 fs.

 

Generated at Sat Feb 10 03:20:26 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.