Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
None
-
3
-
9223372036854775807
Description
Fast commits for ext4 - LWN.net
https://lwn.net/Articles/842385/#:~:text=The%20Linux%205.10%20release%20included,%2C%20lighter%2Dweight%20journaling%20method.
The Linux 5.10 release included a change that is expected to significantly increase the performance of the ext4 filesystem; it goes by the name "fast commits" and introduces a new, lighter-weight journaling method.
Unfortunately this breaks ldiskfs. Disable ext4/jbd2 fast commit
Error (on MDS):
[ 59.357528] Lustre: DEBUG MARKER: mds8.test: executing set_hostid [ 64.281515] LDISKFS-fs (vdb): mounted filesystem with ordered data mode. Opts: errors=remount-ro [ 69.918052] LDISKFS-fs (vdb): mounted filesystem with ordered data mode. Opts: errors=remount-ro [ 69.960957] RPC: Registered named UNIX socket transport module. [ 69.960974] RPC: Registered udp transport module. [ 69.960984] RPC: Registered tcp transport module. [ 69.960993] RPC: Registered tcp NFSv4.1 backchannel transport module. [ 69.984978] Lustre: Lustre: Build Version: 2.14.54_522_g3f584cf_dirty [ 70.021102] LNet: Added LNI 192.168.122.175@tcp [8/256/0/180] [ 70.021438] LNet: Accept secure, port 988 [ 70.084523] LDISKFS-fs (vdb): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,user_xattr,no_mbcache,nodelalloc [ 70.179250] Lustre: ctl-lustre-MDT0000: No data found on store. Initialize space: rc = -61 [ 71.214201] Lustre: lustre-MDT0000: new disk, initializing [ 71.291209] Lustre: lustre-MDT0000: Imperative Recovery not enabled, recovery window 60-180 [ 71.297880] Lustre: ctl-lustre-MDT0000: super-sequence allocation rc = 0 [0x0000000200000400-0x0000000240000400]:0:mdt [ 71.320210] ------------[ cut here ]------------ [ 71.320249] WARNING: CPU: 0 PID: 1978 at fs/jbd2/transaction.c:1512 jbd2_journal_dirty_metadata+0x201/0x220 [jbd2] [ 71.320283] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) osc(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) sunrpc(E) ldiskfs(OE) mbcache(E) jbd2(E) libcfs(OE) fuse(E) rfkill(E) intel_rapl_msr(E) intel_rapl_common(E) intel_pmc_core_pltdrv(E) intel_pmc_core(E) kvm_intel(E) kvm(E) qxl(E) iTCO_wdt(E) intel_pmc_bxt(E) iTCO_vendor_support(E) drm_ttm_helper(E) ttm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) snd_hda_codec_generic(E) ledtrig_audio(E) drm_kms_helper(E) snd_hda_intel(E) snd_intel_dspcfg(E) snd_hda_codec(E) snd_hda_core(E) snd_hwdep(E) ghash_clmulni_intel(E) rapl(E) snd_pcm(E) drm(E) pcspkr(E) snd_timer(E) snd(E) i2c_i801(E) i2c_smbus(E) soundcore(E) lpc_ich(E) virtio_balloon(E) joydev(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) xfs(E) libcrc32c(E) sr_mod(E) cdrom(E) sg(E) ahci(E) libahci(E) libata(E) crc32c_intel(E) serio_raw(E) virtio_scsi(E) [ 71.320342] virtio_blk(E) virtio_net(E) virtio_console(E) net_failover(E) failover(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) [ 71.320667] CPU: 0 PID: 1978 Comm: lod0000_rec0000 Tainted: G OE 5.10.9-1.ldiskfs.el8.x86_64 #1 [ 71.320700] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014 [ 71.320736] RIP: 0010:jbd2_journal_dirty_metadata+0x201/0x220 [jbd2] [ 71.320759] Code: 07 00 0f 1f 40 00 e9 50 ff ff ff 49 39 6c 24 30 0f 84 83 fe ff ff 0f 0b 4d 8b 4a 70 4c 39 cd 0f 84 36 ff ff ff e9 45 a0 00 00 <0f> 0b 41 bd e4 ff ff ff e9 24 ff ff ff 0f 0b 41 bd 8b ff ff ff e9 [ 71.320819] RSP: 0018:ffff9ec58152bc28 EFLAGS: 00010246 [ 71.320839] RAX: 0000000000000001 RBX: ffff89e9c47d4c78 RCX: 00000000000003fc [ 71.320863] RDX: 0000000000000000 RSI: ffff89e9c307e888 RDI: ffff89e9c314a2d8 [ 71.320887] RBP: ffff89e9c47d3500 R08: ffff89e9c307e888 R09: ffff89e9ede0f000 [ 71.320921] R10: ffff89e9c7a3e800 R11: ffff89e9c314a2d0 R12: ffff89e9c314a2d0 [ 71.321977] R13: 0000000000000000 R14: ffff89e9c307e888 R15: ffff89e9c314a2d8 [ 71.323182] FS: 0000000000000000(0000) GS:ffff89ea3bc00000(0000) knlGS:0000000000000000 [ 71.324359] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 71.325367] CR2: 0000555bb1fd9e48 CR3: 0000000108048002 CR4: 0000000000370ef0 [ 71.325943] Call Trace: [ 71.326543] __ldiskfs_handle_dirty_metadata+0x4f/0x1a0 [ldiskfs] [ 71.327120] ldiskfs_getblk+0x13d/0x1d0 [ldiskfs] [ 71.327672] ldiskfs_bread+0xb/0x70 [ldiskfs] [ 71.328261] osd_write+0x6ca/0xc70 [osd_ldiskfs] [ 71.328831] dt_record_write+0x32/0x110 [obdclass] [ 71.329402] llog_osd_put_cat_list+0x799/0x930 [obdclass] [ 71.329942] lod_sub_prep_llog+0x6d9/0x81a [lod] [ 71.330491] lod_sub_recovery_thread+0xed/0xb60 [lod] [ 71.331047] ? __schedule+0x299/0x7a0 [ 71.331586] ? lod_trans_stop+0x360/0x360 [lod] [ 71.332128] kthread+0x116/0x130 [ 71.332664] ? kthread_park+0x80/0x80 [ 71.333220] ret_from_fork+0x22/0x30 [ 71.333764] ---[ end trace c7b3deb5174e8df2 ]--- [ 71.334314] ------------[ cut here ]------------ [ 71.334869] WARNING: CPU: 0 PID: 1978 at /home/shaun/lustre-release/ldiskfs/ext4_jbd2.c:333 __ldiskfs_handle_dirty_metadata+0x110/0x1a0 [ldiskfs] [ 71.335930] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) osc(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) sunrpc(E) ldiskfs(OE) mbcache(E) jbd2(E) libcfs(OE) fuse(E) rfkill(E) intel_rapl_msr(E) intel_rapl_common(E) intel_pmc_core_pltdrv(E) intel_pmc_core(E) kvm_intel(E) kvm(E) qxl(E) iTCO_wdt(E) intel_pmc_bxt(E) iTCO_vendor_support(E) drm_ttm_helper(E) ttm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) snd_hda_codec_generic(E) ledtrig_audio(E) drm_kms_helper(E) snd_hda_intel(E) snd_intel_dspcfg(E) snd_hda_codec(E) snd_hda_core(E) snd_hwdep(E) ghash_clmulni_intel(E) rapl(E) snd_pcm(E) drm(E) pcspkr(E) snd_timer(E) snd(E) i2c_i801(E) i2c_smbus(E) soundcore(E) lpc_ich(E) virtio_balloon(E) joydev(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) xfs(E) libcrc32c(E) sr_mod(E) cdrom(E) sg(E) ahci(E) libahci(E) libata(E) crc32c_intel(E) serio_raw(E) virtio_scsi(E) [ 71.335966] virtio_blk(E) virtio_net(E) virtio_console(E) net_failover(E) failover(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) [ 71.342010] CPU: 0 PID: 1978 Comm: lod0000_rec0000 Tainted: G W OE 5.10.9-1.ldiskfs.el8.x86_64 #1 [ 71.342642] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014 [ 71.343263] RIP: 0010:__ldiskfs_handle_dirty_metadata+0x110/0x1a0 [ldiskfs] [ 71.343825] Code: 01 40 e9 4e ff ff ff f0 80 4b 01 20 e9 38 ff ff ff f0 80 0b 01 4d 85 ed 75 8f 48 89 df 45 31 e4 e8 55 94 de f6 e9 65 ff ff ff <0f> 0b 48 c7 c2 c0 56 9d c0 45 89 e0 48 89 e9 44 89 fe 4c 89 f7 e8 [ 71.344917] RSP: 0018:ffff9ec58152bc68 EFLAGS: 00010286 [ 71.345502] RAX: ffff89e9c7a3e800 RBX: ffff89e9c307e888 RCX: 00000000000003fc [ 71.346048] RDX: 0000000000000000 RSI: ffff89e9c307e888 RDI: ffff89e9c314a2d8 [ 71.346603] RBP: ffff89e9c47d4c78 R08: ffff89e9c307e888 R09: ffff89e9ede0f000 [ 71.347144] R10: ffff89e9c7a3e800 R11: ffff89e9c314a2d0 R12: 00000000ffffffe4 [ 71.347685] R13: ffff89e9c313cf80 R14: ffffffffc09d7030 R15: 0000000000000376 [ 71.348215] FS: 0000000000000000(0000) GS:ffff89ea3bc00000(0000) knlGS:0000000000000000 [ 71.348925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 71.349757] CR2: 0000555bb1fd9e48 CR3: 0000000108048002 CR4: 0000000000370ef0 [ 71.350602] Call Trace: [ 71.351183] ldiskfs_getblk+0x13d/0x1d0 [ldiskfs] [ 71.351799] ldiskfs_bread+0xb/0x70 [ldiskfs] [ 71.352356] osd_write+0x6ca/0xc70 [osd_ldiskfs] [ 71.353261] dt_record_write+0x32/0x110 [obdclass] [ 71.354052] llog_osd_put_cat_list+0x799/0x930 [obdclass] [ 71.354867] lod_sub_prep_llog+0x6d9/0x81a [lod] [ 71.355578] lod_sub_recovery_thread+0xed/0xb60 [lod] [ 71.356183] ? __schedule+0x299/0x7a0 [ 71.356790] ? lod_trans_stop+0x360/0x360 [lod] [ 71.357318] kthread+0x116/0x130 [ 71.357872] ? kthread_park+0x80/0x80 [ 71.358576] ret_from_fork+0x22/0x30 [ 71.359418] ---[ end trace c7b3deb5174e8df3 ]--- [ 71.360306] LDISKFS-fs: ldiskfs_getblk:886: aborting transaction: error 28 in __ldiskfs_handle_dirty_metadata [ 71.361223] LDISKFS-fs error (device vdb): ldiskfs_getblk:886: inode #107: block 136305: comm lod0000_rec0000: journal_dirty_metadata failed: handle type 0 started at line 1979, credits 7/0, errcode -28 [ 71.369909] Aborting journal on device vdb-8. [ 71.374257] LDISKFS-fs (vdb): Remounting filesystem read-only [ 71.375334] LustreError: 1978:0:(osd_io.c:2128:osd_ldiskfs_write_record()) lustre-MDT0000: error reading offset 0 (block 0, size 32, offs 0), credits 7/1: rc = -28 [ 71.379816] LustreError: 1927:0:(osd_handler.c:1787:osd_trans_commit_cb()) transaction @0x000000001e4cbe3c commit error: 2 [ 71.382743] LDISKFS-fs error (device vdb) in osd_trans_stop:2089: error 28 [ 71.385258] LustreError: 1978:0:(osd_handler.c:2092:osd_trans_stop()) lustre-MDT0000: failed to stop transaction: rc = -28 [ 71.386192] LustreError: 1978:0:(lod_dev.c:424:lod_sub_recovery_thread()) lustre-MDT0000-osd get update log failed: rc = -28 [ 71.742336] Lustre: DEBUG MARKER: llmount : @@@@@@ FAIL: mds1 is in a unhealthy state [ 72.433338] Lustre: DEBUG MARKER: mds8.test: executing check_logdir /tmp/test_logs/1648563277
Warning at fs/jbd2/transaction.c:1512 jbd2_journal_dirty_metadata+0x201/0x220 [jbd2]
https://elixir.bootlin.com/linux/v5.10.9/source/fs/jbd2/transaction.c#L1512
Warning in patched ext4, ldiskfs/ext2_jbd.c:333 :
err = jbd2_journal_dirty_metadata(handle, bh); /* Errors can only happen due to aborted journal or a nasty bug */ if (!is_handle_aborted(handle) && WARN_ON_ONCE(err)) { <---- WARNING HERE ldiskfs_journal_abort_handle(where, line, __func__, bh, handle, err); if (inode == NULL) { pr_err("LDISKFS: jbd2_journal_dirty_metadata " "failed: handle type %u started at " "line %u, credits %u/%u, errcode %d", handle->h_type, handle->h_line_no, handle->h_requested_credits, jbd2_handle_buffer_credits(handle), err); return err; }
Attachments
Issue Links
- is related to
-
LU-14678 ldiskfs fast commit feature
- Open