[LU-16298] periodically write ldiskfs superblock Created: 03/Nov/22  Updated: 28/Aug/23  Resolved: 19/Jul/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0

Type: Improvement Priority: Major
Reporter: Andreas Dilger Assignee: Vitaliy Kuznetsov
Resolution: Fixed Votes: 0
Labels: ldiskfs

Issue Links:
Related
is related to LU-16982 Crash lustre after umount -d -f /mnt/... Resolved
Rank (Obsolete): 9223372036854775807

 Description   

The ext4 superblock has an "s_kbytes_written" field that is supposed to contain the total number of bytes written to the block device since the filesystem was formatted. The in-memory counter that tracks the current number of writes written to the block device is exposed to userspace via /sys/fs/{ldiskfs,ext4}/<device>/session_kbytes_written, and this is added to s_kbytes_written to generate the "lifetime" writes and shown at .../lifetime_kbytes_written. This would be useful for tracking the total lifetime writes on flash OST and MDT devices, assuming they are not reformatted during usage (which should be rare). The in-memory block device writes counter is written to disk via ext4_update_super().

Unfortunately, upstream ext4 after commit v3.5-rc5-19-g4d47603d9703 no longer writes out the superblock on any regular basis. The superblock is only written at unmount/remount time, in case ext4_error() is hit, or if the filesystem is resized or frozen, which means that the s_kbytes_written counter may frequently be inaccurate due to missing updates if the filesystem is not unmounted cleanly each time (e.g. due to crash/reboot/STONITH).

Having a periodic write of the superblock (e.g. once per hour, or other tunable interval) would not add any measurable overhead to the system, but ensure that the s_kbytes_written counter is kept relatively well updated, so at most an hour worth of writes would be lost in case of a crash and remount.



 Comments   
Comment by Andreas Dilger [ 03/Nov/22 ]

Implementation notes:

  • calling ext4_update_super() also has the added benefit that the superblock s_free_inodes_count and s_free_blocks_count fields are updated on disk
  • there are separate s_kbytes_written values for the on-disk superblock and in-memory superblock
  • before writing the superblock to disk, only the on-disk superblock value should be updated, so that the amount is not double-counted in lifetime_kbytes_written. this is already handled correctly by ext4_update_super()
  • the write should be skipped if the number of bytes written is very small (e.g. under 16MB) so that the superblock write itself does not spin up a sleeping disk
  • it might be possible to have a regular check (e.g. in journal commit if (now - s_wtime) > interval) to trigger the superblock write, to ensure that it is written only while the disk is active, rather than on a timer that may trigger when the disk is idle.
  • using s_wtime for the interval check (with ext4_get_tstamp()) is also convenient, since that is updated in ext4_update_super at the same time the counters are written to disk
Comment by Gerrit Updater [ 16/Jun/23 ]

"Vitaliy Kuznetsov <vkuznetsov@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51340
Subject: LU-16298 ldiskfs: Periodically write ldiskfs superblock
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 018987285af2c40f9fcb1a0008e5d9283658dd7d

Comment by Gerrit Updater [ 19/Jul/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/51340/
Subject: LU-16298 ldiskfs: Periodically write ldiskfs superblock
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e27a7b33d6351ff8b8bae101079af88f4eedac99

Comment by Peter Jones [ 19/Jul/23 ]

Landed for 2.16

Generated at Sat Feb 10 03:25:46 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.