Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15726

Introduce / use min journal credit for ldiskfs

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • None
    • None
    • None
    • 9223372036854775807

    Description

      lustre/ldiskfs consumes more journal credits than ext4. Try to place nice with jbd2 and increase the requested journal credits as needed.

      Attachments

        Issue Links

          Activity

            [LU-15726] Introduce / use min journal credit for ldiskfs

            Resolved with LU-13135

            stancheff Shaun Tancheff added a comment - Resolved with LU-13135

            I noticed the same thing for Ubuntu 5.15 kernels. Patch 51776 fixes this issue. Shaun can you close this ticket.

            simmonsja James A Simmons added a comment - I noticed the same thing for Ubuntu 5.15 kernels. Patch 51776 fixes this issue. Shaun can you close this ticket.
            xinliang Xinliang Liu added a comment - - edited

            After a long bisect on branch master, find out that  branch b2_15 with commit ef90a02d12  can run on kernel 5.10 with no crash.

            But don't know why? Does this issue still exist in 5.10+ kernel for non root-owned files? Does anyone have any ideas on this? @Alex Zhuravlev

            Anyway, just cherry-picked it to branch b2_15 and make a note it is related to this issue: https://review.whamcloud.com/c/fs/lustre-release/+/51776

            xinliang Xinliang Liu added a comment - - edited After a long bisect on branch master, find out that  branch b2_15 with commit  ef90a02d12   can run on kernel 5.10 with no crash. But don't know why? Does this issue still exist in 5.10+ kernel for non root-owned files? Does anyone have any ideas on this? @ Alex Zhuravlev Anyway, just cherry-picked it to branch b2_15 and make a note it is related to this issue: https://review.whamcloud.com/c/fs/lustre-release/+/51776
            xinliang Xinliang Liu added a comment -

            I tried the latest b2_15 branch(2.15.3-RC1) on 5.10 kernel, it seems that without patch https://review.whamcloud.com/47009 it can't mount client(all-in-one) or start MDS (muli-node). So branch b2_15 must lack of a patch or some patches from branch master for fixing this credit related issue. Which ones?

            And here the warning log from the kernel:

            [ 8189.170458] ------------[ cut here ]------------
            [ 8189.170504] WARNING: CPU: 0 PID: 115468 at /tmp/rpmbuild-lustre-openeuler-AL963B8M/BUILD/lustre-2.15.3_RC1_5_g4aaae55_dirty/ldiskfs/ext4_jbd2.c:336 __ldiskfs_handle_dirty_metadata+0x18c/0x2e0 [ldiskfs]
            [ 8189.170506] Modules linked in: ofd(OE) ost(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) osd_ldiskfs(OE) lquota(OE) loop ldiskfs(OE) lustre(OE) obdecho(OE) mgc(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) kso
            cklnd(OE) lnet(OE) libcfs(OE) dm_flakey dm_mod crc32_generic rfkill sunrpc virtio_balloon vfat fat sch_fq_codel fuse ext4 mbcache jbd2 virtio_gpu virtio_net virtio_dma_buf net_failover virtio_blk failover ghash_ce sha2_ce sha256_arm64 sha1_ce virtio_pci virtio_pci_
            modern_dev virtio_mmio virtio_rng virtio virtio_ring aes_neon_bs aes_neon_blk aes_ce_blk crypto_simd cryptd aes_ce_cipher [last unloaded: libcfs]
            [ 8189.170583] CPU: 0 PID: 115468 Comm: mdt00_001 Kdump: loaded Tainted: G        W  OE     5.10.0-152.0.0.78.oe2203sp2.aarch64 #1
            [ 8189.170585] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
            [ 8189.170588] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
            [ 8189.170610] pc : __ldiskfs_handle_dirty_metadata+0x18c/0x2e0 [ldiskfs]
            [ 8189.170631] lr : __ldiskfs_handle_dirty_metadata+0x9c/0x2e0 [ldiskfs]
            [ 8189.170632] sp : ffff8000109e36a0
            [ 8189.170634] x29: ffff8000109e36c0 x28: 0000000000000000
            [ 8189.170638] x27: 0000000000000002 x26: 0000000000000001
            [ 8189.170641] x25: 0000000000000001 x24: ffff0bf31c309af8
            [ 8189.170645] x23: 0000000000000372 x22: ffffde4408547508
            [ 8189.170648] x21: 00000000ffffffe4 x20: ffff0bf318c45a10
            [ 8189.170651] x19: ffff0bf31c1ff1a0 x18: 0000000000000020
            [ 8189.170654] x17: 0000000000000000 x16: ffffde4435035f10
            [ 8189.170658] x15: ffffffffffffffff x14: 0000000000000000
            [ 8189.170661] x13: 0000000000191000 x12: 0000000000000000
            [ 8189.170665] x11: 0000000000000000 x10: 00000000ffffffff
            [ 8189.170668] x9 : ffffde44084c254c x8 : ffff0bf34f812000
            [ 8189.170671] x7 : 0000000000000000 x6 : 0000000000000000
            [ 8189.170674] x5 : 61c8864680b583eb x4 : 0000000000116011
            [ 8189.170678] x3 : ffff0bf31f327800 x2 : 0000000000000001
            [ 8189.170681] x1 : 00000000007be000 x0 : 0000000000000030
            [ 8189.170685] Call trace:
            [ 8189.170706]  __ldiskfs_handle_dirty_metadata+0x18c/0x2e0 [ldiskfs]
            [ 8189.170727]  ldiskfs_getblk+0x150/0x210 [ldiskfs]
            [ 8189.170748]  ldiskfs_bread+0x1c/0xd4 [ldiskfs]
            [ 8189.170765]  osd_ldiskfs_write_record+0x4a4/0x8fc [osd_ldiskfs]
            [ 8189.170779]  osd_write+0x104/0x6e4 [osd_ldiskfs]
            [ 8189.170842]  dt_record_write+0x38/0xf0 [obdclass]
            [ 8189.170943]  tgt_client_data_write+0x12c/0x180 [ptlrpc]
            [ 8189.171012]  tgt_client_data_update+0x4fc/0x86c [ptlrpc]
            [ 8189.171079]  tgt_client_new+0x610/0xcb0 [ptlrpc]
            [ 8189.171117]  mdt_obd_connect+0x5b0/0x940 [mdt]
            [ 8189.171370]  target_handle_connect+0x10e4/0x3b00 [ptlrpc]
            [ 8189.171465]  tgt_request_handle+0x174/0xd9c [ptlrpc]
            [ 8189.171545]  ptlrpc_server_handle_request.isra.0+0x3d4/0x11fc [ptlrpc]
            [ 8189.171613]  ptlrpc_main+0xdb0/0x1670 [ptlrpc]
            [ 8189.171620]  kthread+0x108/0x13c
            [ 8189.171624]  ret_from_fork+0x10/0x18
            [ 8189.171626] ---[ end trace ce1929bc2ec68092 ]---
            [ 8189.171631] LDISKFS-fs: ldiskfs_getblk:882: aborting transaction: error 28 in __ldiskfs_handle_dirty_metadata
            [ 8189.174222] LDISKFS-fs error (device dm-0): ldiskfs_getblk:882: inode #91: block 31655: comm mdt00_001: journal_dirty_metadata failed: handle type 0 started at line 1982, credits 7/0, errcode -28
            [ 8189.178341] Aborting journal on device dm-0-8.
            [ 8189.179762] LDISKFS-fs (dm-0): Remounting filesystem read-only
            [ 8189.181278] LustreError: 115468:0:(osd_io.c:2123:osd_ldiskfs_write_record()) lustre-MDT0000: error reading offset 8192 (block 2, size 128, offs 8192), credits 7/1: rc = -28
            [ 8189.184623] LDISKFS-fs error (device dm-0) in osd_trans_stop:2092: error 28
            [ 8189.184635] LustreError: 115449:0:(osd_handler.c:1790:osd_trans_commit_cb()) transaction @0x00000000ce92c156 commit error: 2
             
            xinliang Xinliang Liu added a comment - I tried the latest b2_15 branch(2.15.3-RC1) on 5.10 kernel, it seems that without patch https://review.whamcloud.com/47009 it can't mount client(all-in-one) or start MDS (muli-node). So branch b2_15 must lack of a patch or some patches from branch master for fixing this credit related issue. Which ones? And here the warning log from the kernel: [ 8189.170458] ------------[ cut here ]------------ [ 8189.170504] WARNING: CPU: 0 PID: 115468 at /tmp/rpmbuild-lustre-openeuler-AL963B8M/BUILD/lustre-2.15.3_RC1_5_g4aaae55_dirty/ldiskfs/ext4_jbd2.c:336 __ldiskfs_handle_dirty_metadata+0x18c/0x2e0 [ldiskfs] [ 8189.170506] Modules linked in: ofd(OE) ost(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) osd_ldiskfs(OE) lquota(OE) loop ldiskfs(OE) lustre(OE) obdecho(OE) mgc(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) kso cklnd(OE) lnet(OE) libcfs(OE) dm_flakey dm_mod crc32_generic rfkill sunrpc virtio_balloon vfat fat sch_fq_codel fuse ext4 mbcache jbd2 virtio_gpu virtio_net virtio_dma_buf net_failover virtio_blk failover ghash_ce sha2_ce sha256_arm64 sha1_ce virtio_pci virtio_pci_ modern_dev virtio_mmio virtio_rng virtio virtio_ring aes_neon_bs aes_neon_blk aes_ce_blk crypto_simd cryptd aes_ce_cipher [last unloaded: libcfs] [ 8189.170583] CPU: 0 PID: 115468 Comm: mdt00_001 Kdump: loaded Tainted: G        W  OE     5.10.0-152.0.0.78.oe2203sp2.aarch64 #1 [ 8189.170585] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [ 8189.170588] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--) [ 8189.170610] pc : __ldiskfs_handle_dirty_metadata+0x18c/0x2e0 [ldiskfs] [ 8189.170631] lr : __ldiskfs_handle_dirty_metadata+0x9c/0x2e0 [ldiskfs] [ 8189.170632] sp : ffff8000109e36a0 [ 8189.170634] x29: ffff8000109e36c0 x28: 0000000000000000 [ 8189.170638] x27: 0000000000000002 x26: 0000000000000001 [ 8189.170641] x25: 0000000000000001 x24: ffff0bf31c309af8 [ 8189.170645] x23: 0000000000000372 x22: ffffde4408547508 [ 8189.170648] x21: 00000000ffffffe4 x20: ffff0bf318c45a10 [ 8189.170651] x19: ffff0bf31c1ff1a0 x18: 0000000000000020 [ 8189.170654] x17: 0000000000000000 x16: ffffde4435035f10 [ 8189.170658] x15: ffffffffffffffff x14: 0000000000000000 [ 8189.170661] x13: 0000000000191000 x12: 0000000000000000 [ 8189.170665] x11: 0000000000000000 x10: 00000000ffffffff [ 8189.170668] x9 : ffffde44084c254c x8 : ffff0bf34f812000 [ 8189.170671] x7 : 0000000000000000 x6 : 0000000000000000 [ 8189.170674] x5 : 61c8864680b583eb x4 : 0000000000116011 [ 8189.170678] x3 : ffff0bf31f327800 x2 : 0000000000000001 [ 8189.170681] x1 : 00000000007be000 x0 : 0000000000000030 [ 8189.170685] Call trace: [ 8189.170706]  __ldiskfs_handle_dirty_metadata+0x18c/0x2e0 [ldiskfs] [ 8189.170727]  ldiskfs_getblk+0x150/0x210 [ldiskfs] [ 8189.170748]  ldiskfs_bread+0x1c/0xd4 [ldiskfs] [ 8189.170765]  osd_ldiskfs_write_record+0x4a4/0x8fc [osd_ldiskfs] [ 8189.170779]  osd_write+0x104/0x6e4 [osd_ldiskfs] [ 8189.170842]  dt_record_write+0x38/0xf0 [obdclass] [ 8189.170943]  tgt_client_data_write+0x12c/0x180 [ptlrpc] [ 8189.171012]  tgt_client_data_update+0x4fc/0x86c [ptlrpc] [ 8189.171079]  tgt_client_new+0x610/0xcb0 [ptlrpc] [ 8189.171117]  mdt_obd_connect+0x5b0/0x940 [mdt] [ 8189.171370]  target_handle_connect+0x10e4/0x3b00 [ptlrpc] [ 8189.171465]  tgt_request_handle+0x174/0xd9c [ptlrpc] [ 8189.171545]  ptlrpc_server_handle_request.isra.0+0x3d4/0x11fc [ptlrpc] [ 8189.171613]  ptlrpc_main+0xdb0/0x1670 [ptlrpc] [ 8189.171620]  kthread+0x108/0x13c [ 8189.171624]  ret_from_fork+0x10/0x18 [ 8189.171626] ---[ end trace ce1929bc2ec68092 ]--- [ 8189.171631] LDISKFS-fs: ldiskfs_getblk:882: aborting transaction: error 28 in __ldiskfs_handle_dirty_metadata [ 8189.174222] LDISKFS-fs error (device dm-0): ldiskfs_getblk:882: inode #91: block 31655: comm mdt00_001: journal_dirty_metadata failed: handle type 0 started at line 1982, credits 7/0, errcode -28 [ 8189.178341] Aborting journal on device dm-0-8. [ 8189.179762] LDISKFS-fs (dm-0): Remounting filesystem read-only [ 8189.181278] LustreError: 115468:0:(osd_io.c:2123:osd_ldiskfs_write_record()) lustre-MDT0000: error reading offset 8192 (block 2, size 128, offs 8192), credits 7/1: rc = -28 [ 8189.184623] LDISKFS-fs error (device dm-0) in osd_trans_stop:2092: error 28 [ 8189.184635] LustreError: 115449:0:(osd_handler.c:1790:osd_trans_commit_cb()) transaction @0x00000000ce92c156 commit error: 2

            Starting with 5.10 kernels the way xattr credits is handled has changed so that the ext4-xattr-disable-credits-check.patch is not good enough to work around this issue. We need a real solution to this problem so I'm reopening this ticket.

            simmonsja James A Simmons added a comment - Starting with 5.10 kernels the way xattr credits is handled has changed so that the ext4-xattr-disable-credits-check.patch is not good enough to work around this issue. We need a real solution to this problem so I'm reopening this ticket.

            Not an improvement

            stancheff Shaun Tancheff added a comment - Not an improvement

            "Shaun Tancheff <shaun.tancheff@hpe.com>" uploaded a new patch: https://review.whamcloud.com/47009
            Subject: LU-15726 ldiskfs: Introduce and use min journal credits
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 0bcea9046aec1bdf022621395ca7693c50132cac

            gerrit Gerrit Updater added a comment - "Shaun Tancheff <shaun.tancheff@hpe.com>" uploaded a new patch: https://review.whamcloud.com/47009 Subject: LU-15726 ldiskfs: Introduce and use min journal credits Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 0bcea9046aec1bdf022621395ca7693c50132cac

            People

              stancheff Shaun Tancheff
              stancheff Shaun Tancheff
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: