[LU-15703] ext4/jbd2 fast commit breaks luster ldiskfs Created: 29/Mar/22  Updated: 05/Dec/22  Resolved: 06/Jul/22

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0, Lustre 2.15.1

Type: Bug Priority: Minor
Reporter: Shaun Tancheff Assignee: Shaun Tancheff
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-14678 ldiskfs fast commit feature Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Fast commits for ext4 - LWN.net
https://lwn.net/Articles/842385/#:~:text=The%20Linux%205.10%20release%20included,%2C%20lighter%2Dweight%20journaling%20method.

The Linux 5.10 release included a change that is expected to significantly increase the performance of the ext4 filesystem; it goes by the name "fast commits" and introduces a new, lighter-weight journaling method.

Unfortunately this breaks ldiskfs. Disable ext4/jbd2 fast commit

Error (on MDS):

[   59.357528] Lustre: DEBUG MARKER: mds8.test: executing set_hostid
[   64.281515] LDISKFS-fs (vdb): mounted filesystem with ordered data mode. Opts: errors=remount-ro
[   69.918052] LDISKFS-fs (vdb): mounted filesystem with ordered data mode. Opts: errors=remount-ro
[   69.960957] RPC: Registered named UNIX socket transport module.
[   69.960974] RPC: Registered udp transport module.
[   69.960984] RPC: Registered tcp transport module.
[   69.960993] RPC: Registered tcp NFSv4.1 backchannel transport module.
[   69.984978] Lustre: Lustre: Build Version: 2.14.54_522_g3f584cf_dirty
[   70.021102] LNet: Added LNI 192.168.122.175@tcp [8/256/0/180]
[   70.021438] LNet: Accept secure, port 988
[   70.084523] LDISKFS-fs (vdb): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,user_xattr,no_mbcache,nodelalloc
[   70.179250] Lustre: ctl-lustre-MDT0000: No data found on store. Initialize space: rc = -61
[   71.214201] Lustre: lustre-MDT0000: new disk, initializing
[   71.291209] Lustre: lustre-MDT0000: Imperative Recovery not enabled, recovery window 60-180
[   71.297880] Lustre: ctl-lustre-MDT0000: super-sequence allocation rc = 0 [0x0000000200000400-0x0000000240000400]:0:mdt
[   71.320210] ------------[ cut here ]------------
[   71.320249] WARNING: CPU: 0 PID: 1978 at fs/jbd2/transaction.c:1512 jbd2_journal_dirty_metadata+0x201/0x220 [jbd2]
[   71.320283] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) osc(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) sunrpc(E) ldiskfs(OE) mbcache(E) jbd2(E) libcfs(OE) fuse(E) rfkill(E) intel_rapl_msr(E) intel_rapl_common(E) intel_pmc_core_pltdrv(E) intel_pmc_core(E) kvm_intel(E) kvm(E) qxl(E) iTCO_wdt(E) intel_pmc_bxt(E) iTCO_vendor_support(E) drm_ttm_helper(E) ttm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) snd_hda_codec_generic(E) ledtrig_audio(E) drm_kms_helper(E) snd_hda_intel(E) snd_intel_dspcfg(E) snd_hda_codec(E) snd_hda_core(E) snd_hwdep(E) ghash_clmulni_intel(E) rapl(E) snd_pcm(E) drm(E) pcspkr(E) snd_timer(E) snd(E) i2c_i801(E) i2c_smbus(E) soundcore(E) lpc_ich(E) virtio_balloon(E) joydev(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) xfs(E) libcrc32c(E) sr_mod(E) cdrom(E) sg(E) ahci(E) libahci(E) libata(E) crc32c_intel(E) serio_raw(E) virtio_scsi(E)
[   71.320342]  virtio_blk(E) virtio_net(E) virtio_console(E) net_failover(E) failover(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
[   71.320667] CPU: 0 PID: 1978 Comm: lod0000_rec0000 Tainted: G           OE     5.10.9-1.ldiskfs.el8.x86_64 #1
[   71.320700] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
[   71.320736] RIP: 0010:jbd2_journal_dirty_metadata+0x201/0x220 [jbd2]
[   71.320759] Code: 07 00 0f 1f 40 00 e9 50 ff ff ff 49 39 6c 24 30 0f 84 83 fe ff ff 0f 0b 4d 8b 4a 70 4c 39 cd 0f 84 36 ff ff ff e9 45 a0 00 00 <0f> 0b 41 bd e4 ff ff ff e9 24 ff ff ff 0f 0b 41 bd 8b ff ff ff e9
[   71.320819] RSP: 0018:ffff9ec58152bc28 EFLAGS: 00010246
[   71.320839] RAX: 0000000000000001 RBX: ffff89e9c47d4c78 RCX: 00000000000003fc
[   71.320863] RDX: 0000000000000000 RSI: ffff89e9c307e888 RDI: ffff89e9c314a2d8
[   71.320887] RBP: ffff89e9c47d3500 R08: ffff89e9c307e888 R09: ffff89e9ede0f000
[   71.320921] R10: ffff89e9c7a3e800 R11: ffff89e9c314a2d0 R12: ffff89e9c314a2d0
[   71.321977] R13: 0000000000000000 R14: ffff89e9c307e888 R15: ffff89e9c314a2d8
[   71.323182] FS:  0000000000000000(0000) GS:ffff89ea3bc00000(0000) knlGS:0000000000000000
[   71.324359] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   71.325367] CR2: 0000555bb1fd9e48 CR3: 0000000108048002 CR4: 0000000000370ef0
[   71.325943] Call Trace:
[   71.326543]  __ldiskfs_handle_dirty_metadata+0x4f/0x1a0 [ldiskfs]
[   71.327120]  ldiskfs_getblk+0x13d/0x1d0 [ldiskfs]
[   71.327672]  ldiskfs_bread+0xb/0x70 [ldiskfs]
[   71.328261]  osd_write+0x6ca/0xc70 [osd_ldiskfs]
[   71.328831]  dt_record_write+0x32/0x110 [obdclass]
[   71.329402]  llog_osd_put_cat_list+0x799/0x930 [obdclass]
[   71.329942]  lod_sub_prep_llog+0x6d9/0x81a [lod]
[   71.330491]  lod_sub_recovery_thread+0xed/0xb60 [lod]
[   71.331047]  ? __schedule+0x299/0x7a0
[   71.331586]  ? lod_trans_stop+0x360/0x360 [lod]
[   71.332128]  kthread+0x116/0x130
[   71.332664]  ? kthread_park+0x80/0x80
[   71.333220]  ret_from_fork+0x22/0x30
[   71.333764] ---[ end trace c7b3deb5174e8df2 ]---
[   71.334314] ------------[ cut here ]------------
[   71.334869] WARNING: CPU: 0 PID: 1978 at /home/shaun/lustre-release/ldiskfs/ext4_jbd2.c:333 __ldiskfs_handle_dirty_metadata+0x110/0x1a0 [ldiskfs]
[   71.335930] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) osc(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) sunrpc(E) ldiskfs(OE) mbcache(E) jbd2(E) libcfs(OE) fuse(E) rfkill(E) intel_rapl_msr(E) intel_rapl_common(E) intel_pmc_core_pltdrv(E) intel_pmc_core(E) kvm_intel(E) kvm(E) qxl(E) iTCO_wdt(E) intel_pmc_bxt(E) iTCO_vendor_support(E) drm_ttm_helper(E) ttm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) snd_hda_codec_generic(E) ledtrig_audio(E) drm_kms_helper(E) snd_hda_intel(E) snd_intel_dspcfg(E) snd_hda_codec(E) snd_hda_core(E) snd_hwdep(E) ghash_clmulni_intel(E) rapl(E) snd_pcm(E) drm(E) pcspkr(E) snd_timer(E) snd(E) i2c_i801(E) i2c_smbus(E) soundcore(E) lpc_ich(E) virtio_balloon(E) joydev(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) xfs(E) libcrc32c(E) sr_mod(E) cdrom(E) sg(E) ahci(E) libahci(E) libata(E) crc32c_intel(E) serio_raw(E) virtio_scsi(E)
[   71.335966]  virtio_blk(E) virtio_net(E) virtio_console(E) net_failover(E) failover(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
[   71.342010] CPU: 0 PID: 1978 Comm: lod0000_rec0000 Tainted: G        W  OE     5.10.9-1.ldiskfs.el8.x86_64 #1
[   71.342642] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
[   71.343263] RIP: 0010:__ldiskfs_handle_dirty_metadata+0x110/0x1a0 [ldiskfs]
[   71.343825] Code: 01 40 e9 4e ff ff ff f0 80 4b 01 20 e9 38 ff ff ff f0 80 0b 01 4d 85 ed 75 8f 48 89 df 45 31 e4 e8 55 94 de f6 e9 65 ff ff ff <0f> 0b 48 c7 c2 c0 56 9d c0 45 89 e0 48 89 e9 44 89 fe 4c 89 f7 e8
[   71.344917] RSP: 0018:ffff9ec58152bc68 EFLAGS: 00010286
[   71.345502] RAX: ffff89e9c7a3e800 RBX: ffff89e9c307e888 RCX: 00000000000003fc
[   71.346048] RDX: 0000000000000000 RSI: ffff89e9c307e888 RDI: ffff89e9c314a2d8
[   71.346603] RBP: ffff89e9c47d4c78 R08: ffff89e9c307e888 R09: ffff89e9ede0f000
[   71.347144] R10: ffff89e9c7a3e800 R11: ffff89e9c314a2d0 R12: 00000000ffffffe4
[   71.347685] R13: ffff89e9c313cf80 R14: ffffffffc09d7030 R15: 0000000000000376
[   71.348215] FS:  0000000000000000(0000) GS:ffff89ea3bc00000(0000) knlGS:0000000000000000
[   71.348925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   71.349757] CR2: 0000555bb1fd9e48 CR3: 0000000108048002 CR4: 0000000000370ef0
[   71.350602] Call Trace:
[   71.351183]  ldiskfs_getblk+0x13d/0x1d0 [ldiskfs]
[   71.351799]  ldiskfs_bread+0xb/0x70 [ldiskfs]
[   71.352356]  osd_write+0x6ca/0xc70 [osd_ldiskfs]
[   71.353261]  dt_record_write+0x32/0x110 [obdclass]
[   71.354052]  llog_osd_put_cat_list+0x799/0x930 [obdclass]
[   71.354867]  lod_sub_prep_llog+0x6d9/0x81a [lod]
[   71.355578]  lod_sub_recovery_thread+0xed/0xb60 [lod]
[   71.356183]  ? __schedule+0x299/0x7a0
[   71.356790]  ? lod_trans_stop+0x360/0x360 [lod]
[   71.357318]  kthread+0x116/0x130
[   71.357872]  ? kthread_park+0x80/0x80
[   71.358576]  ret_from_fork+0x22/0x30
[   71.359418] ---[ end trace c7b3deb5174e8df3 ]---
[   71.360306] LDISKFS-fs: ldiskfs_getblk:886: aborting transaction: error 28 in __ldiskfs_handle_dirty_metadata
[   71.361223] LDISKFS-fs error (device vdb): ldiskfs_getblk:886: inode #107: block 136305: comm lod0000_rec0000: journal_dirty_metadata failed: handle type 0 started at line 1979, credits 7/0, errcode -28
[   71.369909] Aborting journal on device vdb-8.
[   71.374257] LDISKFS-fs (vdb): Remounting filesystem read-only
[   71.375334] LustreError: 1978:0:(osd_io.c:2128:osd_ldiskfs_write_record()) lustre-MDT0000: error reading offset 0 (block 0, size 32, offs 0), credits 7/1: rc = -28
[   71.379816] LustreError: 1927:0:(osd_handler.c:1787:osd_trans_commit_cb()) transaction @0x000000001e4cbe3c commit error: 2
[   71.382743] LDISKFS-fs error (device vdb) in osd_trans_stop:2089: error 28
[   71.385258] LustreError: 1978:0:(osd_handler.c:2092:osd_trans_stop()) lustre-MDT0000: failed to stop transaction: rc = -28
[   71.386192] LustreError: 1978:0:(lod_dev.c:424:lod_sub_recovery_thread()) lustre-MDT0000-osd get update log failed: rc = -28
[   71.742336] Lustre: DEBUG MARKER: llmount : @@@@@@ FAIL: mds1 is in a unhealthy state
[   72.433338] Lustre: DEBUG MARKER: mds8.test: executing check_logdir /tmp/test_logs/1648563277

Warning at fs/jbd2/transaction.c:1512 jbd2_journal_dirty_metadata+0x201/0x220 [jbd2]
https://elixir.bootlin.com/linux/v5.10.9/source/fs/jbd2/transaction.c#L1512

Warning in patched ext4, ldiskfs/ext2_jbd.c:333 :

		err = jbd2_journal_dirty_metadata(handle, bh);
		/* Errors can only happen due to aborted journal or a nasty bug */
		if (!is_handle_aborted(handle) && WARN_ON_ONCE(err)) { <---- WARNING HERE
			ldiskfs_journal_abort_handle(where, line, __func__, bh,
						  handle, err);
			if (inode == NULL) {
				pr_err("LDISKFS: jbd2_journal_dirty_metadata "
				       "failed: handle type %u started at "
				       "line %u, credits %u/%u, errcode %d",
				       handle->h_type,
				       handle->h_line_no,
				       handle->h_requested_credits,
				       jbd2_handle_buffer_credits(handle), err);
				return err;
			}


 Comments   
Comment by Gerrit Updater [ 29/Mar/22 ]

"Shaun Tancheff <shaun.tancheff@hpe.com>" uploaded a new patch: https://review.whamcloud.com/46943
Subject: LU-15703 ldiskfs: disable ext4/jbd2 fast commit
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 7eac56305da26da4bcf7ac59d2090e703586acef

Comment by Patrick Farrell [ 29/Mar/22 ]

Can you say more about why this breaks with ldiskfs?

Comment by Shaun Tancheff [ 31/Mar/22 ]

It seems there is also changes in jbd2 journal credits that break lustre. Someone can probably come up with something more reasonable than just padding the credit calculation for new inodes / xattrs.
As of now I can mount / test on a small cluster running 5.10.9, 5.14.21, and 5.16.9.

Comment by Gerrit Updater [ 24/Jun/22 ]

"Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/47757
Subject: LU-15703 ldiskfs: Disable unused fast commit buffer
Project: fs/lustre-release
Branch: b2_15
Current Patch Set: 1
Commit: ce764be32c0d8cdda604eb6cff1b8fce643ddf57

Comment by Gerrit Updater [ 06/Jul/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/46943/
Subject: LU-15703 ldiskfs: Disable unused fast commit buffer
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: ed80c3f64317a87ed10919cedbf2d65b36abcdcb

Comment by Gerrit Updater [ 06/Jul/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/47757/
Subject: LU-15703 ldiskfs: Disable unused fast commit buffer
Project: fs/lustre-release
Branch: b2_15
Current Patch Set:
Commit: 2fd033c6848f54a06ca9379789f37f73de80dce9

Generated at Sat Feb 10 03:20:35 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.