Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10762

sanity test 300k transaction accounting problem

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.11.0
    • 3
    • 9223372036854775807

    Description

      I added DNE testing into my testbed and test 300k seems to be having problems on ldiskfs.

      It starts like this:

      [13013.691307] Lustre: DEBUG MARKER: == sanity test 300k: test large striped directory ==================================================== 18:35:09 (1520206509)
      [13014.055529] Lustre: *** cfs_fail_loc=1703, val=0***
      [13014.098222] Lustre: 24226:0:(osd_internal.h:1151:osd_trans_exec_op()) lustre-MDT0000: opcode 3: before 512 < left 1480, rollback = 4
      [13014.099829] Lustre: 24226:0:(osd_internal.h:1151:osd_trans_exec_op()) Skipped 23 previous similar messages
      [13014.101329] Lustre: 24226:0:(osd_handler.c:1741:osd_trans_dump_creds())   create: 185/1480/0, destroy: 0/0/0
      [13014.102702] Lustre: 24226:0:(osd_handler.c:1741:osd_trans_dump_creds()) Skipped 23 previous similar messages
      [13014.103935] Lustre: 24226:0:(osd_handler.c:1748:osd_trans_dump_creds())   attr_set: 1/1/0, xattr_set: 346/33383/0
      [13014.105479] Lustre: 24226:0:(osd_handler.c:1748:osd_trans_dump_creds()) Skipped 23 previous similar messages
      [13014.106887] Lustre: 24226:0:(osd_handler.c:1758:osd_trans_dump_creds())   write: 53/1648/0, punch: 0/0/0, quota 2/2/0
      [13014.107836] Lustre: 24226:0:(osd_handler.c:1758:osd_trans_dump_creds()) Skipped 23 previous similar messages
      [13014.112440] Lustre: 24226:0:(osd_handler.c:1765:osd_trans_dump_creds())   insert: 1055/23885/0, delete: 0/0/0
      [13014.114028] Lustre: 24226:0:(osd_handler.c:1765:osd_trans_dump_creds()) Skipped 23 previous similar messages
      [13014.115799] Lustre: 24226:0:(osd_handler.c:1772:osd_trans_dump_creds())   ref_add: 685/685/0, ref_del: 0/0/0
      [13014.132704] Lustre: 24226:0:(osd_handler.c:1772:osd_trans_dump_creds()) Skipped 23 previous similar messages
      [13017.644534] ------------[ cut here ]------------
      [13017.650823] WARNING: at /home/green/git/lustre-release/ldiskfs/ext4_jbd2.c:266 __ldiskfs_handle_dirty_metadata+0x1d2/0x230 [ldiskfs]()
      [13017.671187] Modules linked in: brd lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) ext4 loop zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) zlib_deflate mbcache jbd2 syscopyarea sysfillrect sysimgblt ata_generic ttm pata_acpi drm_kms_helper drm i2c_piix4 ata_piix virtio_balloon pcspkr serio_raw virtio_blk i2c_core virtio_console libata floppy nfsd ip_tables rpcsec_gss_krb5 [last unloaded: obdecho]
      [13017.687393] CPU: 6 PID: 24226 Comm: mdt03_004 Tainted: P           OE  ------------   3.10.0-debug #2
      [13017.688706] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [13017.689495]  0000000000000000 000000009b5bd8bc ffff880213757438 ffffffff816fd3e4
      [13017.691291]  ffff880213757470 ffffffff81076250 ffff880076c48f88 ffff880237e11fa0
      [13017.692912]  ffff8800855c39f8 ffffffffa0b48d9c 0000000000000323 ffff880213757480
      [13017.694222] Call Trace:
      [13017.694832]  [<ffffffff816fd3e4>] dump_stack+0x19/0x1b
      [13017.695505]  [<ffffffff81076250>] warn_slowpath_common+0x70/0xb0
      [13017.696204]  [<ffffffff8107639a>] warn_slowpath_null+0x1a/0x20
      [13017.697091]  [<ffffffffa0b38192>] __ldiskfs_handle_dirty_metadata+0x1d2/0x230 [ldiskfs]
      [13017.698752]  [<ffffffffa0b17602>] ldiskfs_getblk+0x142/0x210 [ldiskfs]
      [13017.699645]  [<ffffffffa0b176f7>] ldiskfs_bread+0x27/0xe0 [ldiskfs]
      [13017.700518]  [<ffffffffa0b091a1>] ldiskfs_append+0x81/0x150 [ldiskfs]
      [13017.701366]  [<ffffffffa0b1022f>] ldiskfs_init_new_dir+0xcf/0x230 [ldiskfs]
      [13017.702238]  [<ffffffffa0b103de>] ldiskfs_add_dot_dotdot+0x4e/0x90 [ldiskfs]
      [13017.703247]  [<ffffffffa0b8559f>] osd_add_dot_dotdot_internal.isra.73+0x5f/0x80 [osd_ldiskfs]
      [13017.706196]  [<ffffffffa0b9a962>] osd_index_ea_insert+0xa22/0x11c0 [osd_ldiskfs]
      [13017.707436]  [<ffffffffa0dbb1b2>] lod_sub_insert+0x1c2/0x340 [lod]
      [13017.708022]  [<ffffffffa0da0387>] lod_xattr_set_lmv.isra.49.constprop.66+0x647/0xe30 [lod]
      [13017.709667]  [<ffffffffa0dae0cf>] lod_dir_striping_create_internal+0x46f/0x950 [lod]
      [13017.712608]  [<ffffffffa0dafb2f>] lod_xattr_set+0x25f/0x860 [lod]
      [13017.713517]  [<ffffffffa0daf8d0>] ? lod_layout_del+0xab0/0xab0 [lod]
      [13017.714535]  [<ffffffffa0c5e6da>] mdo_xattr_set+0x7a/0x180 [mdd]
      [13017.715478]  [<ffffffffa0c76981>] ? mdd_env_info+0x21/0x60 [mdd]
      [13017.716396]  [<ffffffffa0c649f6>] mdd_create_object+0x416/0x980 [mdd]
      [13017.717305]  [<ffffffffa0c6c383>] mdd_create+0xce3/0x13a0 [mdd]
      [13017.718352]  [<ffffffffa0ce7296>] mdt_create+0x846/0xbb0 [mdt]
      [13017.719404]  [<ffffffffa03d1b89>] ? lprocfs_counter_add+0xf9/0x160 [obdclass]
      [13017.720412]  [<ffffffffa0406481>] ? lprocfs_job_stats_log+0xd1/0x600 [obdclass]
      [13017.726742]  [<ffffffffa0ce776b>] mdt_reint_create+0x16b/0x350 [mdt]
      [13017.727541]  [<ffffffffa0ce8cc0>] mdt_reint_rec+0x80/0x210 [mdt]
      [13017.728262]  [<ffffffffa0cc87bb>] mdt_reint_internal+0x5fb/0x990 [mdt]
      [13017.728967]  [<ffffffffa0cd3587>] mdt_reint+0x67/0x140 [mdt]
      [13017.730200]  [<ffffffffa063b28b>] tgt_request_handle+0x93b/0x13e0 [ptlrpc]
      [13017.730981]  [<ffffffffa05e1ce1>] ptlrpc_server_handle_request+0x261/0xaf0 [ptlrpc]
      [13017.732359]  [<ffffffffa05e5a98>] ptlrpc_main+0xa58/0x1df0 [ptlrpc]
      [13017.733076]  [<ffffffffa05e5040>] ? ptlrpc_register_service+0xeb0/0xeb0 [ptlrpc]
      [13017.748161]  [<ffffffff810a2eba>] kthread+0xea/0xf0
      [13017.750350]  [<ffffffff810a2dd0>] ? kthread_create_on_node+0x140/0x140
      [13017.751067]  [<ffffffff8170fb98>] ret_from_fork+0x58/0x90
      [13017.751743]  [<ffffffff810a2dd0>] ? kthread_create_on_node+0x140/0x140
      [13017.752450] ---[ end trace 4ba61d7be859e71c ]---
      [13017.753099] LDISKFS-fs: ldiskfs_getblk:803: aborting transaction: error 28 in __ldiskfs_handle_dirty_metadata
      [13017.758373] LDISKFS-fs error (device loop0): ldiskfs_getblk:803: inode #26416: block 17641: comm mdt03_004: journal_dirty_metadata failed: handle type 0 started at line 1837, credits 512/0, errcode -28
      [13017.761414] Aborting journal on device loop0-8.
      

      Understandably a lot of errors ensure after that and then we get leaked objects in cache as we try to remove the modules:

      [13293.993223] Lustre: 31450:0:(gss_mech_switch.c:80:lgss_mech_unregister()) Unregister krb5 mechanism
      [13295.637401] LNet: Removed LNI 192.168.123.193@tcp
      [13296.208435] kmem_cache_destroy ldiskfs_inode_cache: Slab cache still has objects
      [13296.230451] CPU: 7 PID: 31473 Comm: rmmod Tainted: P        W  OE  ------------   3.10.0-debug #2
      [13296.246837] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [13296.247525]  ffff880322083b00 0000000055345a9a ffff880303a6fec0 ffffffff816fd3e4
      [13296.248832]  ffff880303a6fee0 ffffffff811965cf fffffffffffffff5 ffffffffa0b65b80
      [13296.250123]  ffff880303a6fef0 ffffffffa0b46cc0 ffff880303a6ff78 ffffffff810f06bb
      [13296.251433] Call Trace:
      [13296.252040]  [<ffffffff816fd3e4>] dump_stack+0x19/0x1b
      [13296.252717]  [<ffffffff811965cf>] kmem_cache_destroy+0xdf/0xf0
      [13296.253481]  [<ffffffffa0b46cc0>] ldiskfs_exit_fs+0x3c/0x37c [ldiskfs]
      [13296.267203]  [<ffffffff810f06bb>] SyS_delete_module+0x16b/0x2d0
      [13296.267942]  [<ffffffff8170fc49>] system_call_fastpath+0x16/0x1b
      [13296.375897] kmem_cache_destroy ldiskfs_extent_status: Slab cache still has objects
      [13296.386704] CPU: 7 PID: 31473 Comm: rmmod Tainted: P        W  OE  ------------   3.10.0-debug #2
      [13296.388371] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [13296.389248]  ffff88030d64d6c0 0000000055345a9a ffff880303a6feb0 ffffffff816fd3e4
      [13296.390923]  ffff880303a6fed0 ffffffff811965cf fffffffffffffff5 ffffffffa0b65b80
      [13296.392558]  ffff880303a6fee0 ffffffffa0b423ca ffff880303a6fef0 ffffffffa0b46cf8
      [13296.394264] Call Trace:
      [13296.397125]  [<ffffffff816fd3e4>] dump_stack+0x19/0x1b
      [13296.398053]  [<ffffffff811965cf>] kmem_cache_destroy+0xdf/0xf0
      [13296.399147]  [<ffffffffa0b423ca>] ldiskfs_exit_es+0x1a/0x20 [ldiskfs]
      [13296.400204]  [<ffffffffa0b46cf8>] ldiskfs_exit_fs+0x74/0x37c [ldiskfs]
      [13296.401109]  [<ffffffff810f06bb>] SyS_delete_module+0x16b/0x2d0
      [13296.402219]  [<ffffffff8170fc49>] system_call_fastpath+0x16/0x1b
      

      Attachments

        Activity

          [LU-10762] sanity test 300k transaction accounting problem

          MDS journal size is too low with DNE. Default journal size is 16M and 4m for single transaction is not enough for now, probably we need some calculation against MDSCOUNT.
          [root@rhel7-1 tests]# REFORMAT=yes MDSCOUNT=4 MDSSIZE=512000 MDSJOURNALSIZE=128 sh sanity.sh
          These options avoid a jbd2 panic on 300k and other tests passed without osd_trans_exec_op error.

          shadow Alexey Lyashkov added a comment - MDS journal size is too low with DNE. Default journal size is 16M and 4m for single transaction is not enough for now, probably we need some calculation against MDSCOUNT. [root@rhel7-1 tests] # REFORMAT=yes MDSCOUNT=4 MDSSIZE=512000 MDSJOURNALSIZE=128 sh sanity.sh These options avoid a jbd2 panic on 300k and other tests passed without osd_trans_exec_op error.
          hornc Chris Horn added a comment -

          FWIW - I do not see the same behavior when I run sanity on a remote client. There are messages in the server's dmesg from obd_trans_exec_op() and obd_trans_dump_creds() for subtests 300d, 300g, 300h, and 300n, but not for 300k. And I don't see the "aborting transaction" error from ldiskfs_getblk(), nor is the MDT is getting re-mounted read-only.

          I only see the read-only remount when the client is on the same node as the server.

          hornc Chris Horn added a comment - FWIW - I do not see the same behavior when I run sanity on a remote client. There are messages in the server's dmesg from obd_trans_exec_op() and obd_trans_dump_creds() for subtests 300d, 300g, 300h, and 300n, but not for 300k. And I don't see the "aborting transaction" error from ldiskfs_getblk(), nor is the MDT is getting re-mounted read-only. I only see the read-only remount when the client is on the same node as the server.
          hornc Chris Horn added a comment - - edited

          I believe I'm seeing this same issue with Lustre 2.11 on a SLES 15 vm:

          [ 5011.405770] Lustre: DEBUG MARKER: == sanity test 300k: test large striped directory ==================================================== 13:09:56 (1532110196)
          [ 5011.523914] Lustre: *** cfs_fail_loc=1703, val=0***
          [ 5011.529595] Lustre: 1525:0:(osd_internal.h:1168:osd_trans_exec_op()) lustre-MDT0000: opcode 3: before 512 < left 2160, rollback = 4
          [ 5011.529604] Lustre: 1525:0:(osd_internal.h:1168:osd_trans_exec_op()) Skipped 71 previous similar messages
          [ 5011.529613] Lustre: 1525:0:(osd_handler.c:1745:osd_trans_dump_creds())   create: 270/2160/0, destroy: 0/0/0
          [ 5011.529617] Lustre: 1525:0:(osd_handler.c:1745:osd_trans_dump_creds()) Skipped 23 previous similar messages
          [ 5011.529623] Lustre: 1525:0:(osd_handler.c:1752:osd_trans_dump_creds())   attr_set: 1/1/0, xattr_set: 516/49873/0
          [ 5011.529627] Lustre: 1525:0:(osd_handler.c:1752:osd_trans_dump_creds()) Skipped 23 previous similar messages
          [ 5011.529636] Lustre: 1525:0:(osd_handler.c:1762:osd_trans_dump_creds())   write: 53/1648/0, punch: 0/0/0, quota 3/3/0
          [ 5011.529640] Lustre: 1525:0:(osd_handler.c:1762:osd_trans_dump_creds()) Skipped 23 previous similar messages
          [ 5011.529648] Lustre: 1525:0:(osd_handler.c:1769:osd_trans_dump_creds())   insert: 1310/26350/0, delete: 0/0/0
          [ 5011.529652] Lustre: 1525:0:(osd_handler.c:1769:osd_trans_dump_creds()) Skipped 23 previous similar messages
          [ 5011.529659] Lustre: 1525:0:(osd_handler.c:1776:osd_trans_dump_creds())   ref_add: 770/770/0, ref_del: 0/0/0
          [ 5011.529664] Lustre: 1525:0:(osd_handler.c:1776:osd_trans_dump_creds()) Skipped 23 previous similar messages
          [ 5015.565453] Lustre: 1525:0:(osd_internal.h:1168:osd_trans_exec_op()) lustre-MDT0000: opcode 8: before 17 < left 25904, rollback = 9
          [ 5015.565459] Lustre: 1525:0:(osd_internal.h:1168:osd_trans_exec_op()) Skipped 2125 previous similar messages
          [ 5015.995265] ------------[ cut here ]------------
          [ 5015.995346] WARNING: CPU: 26 PID: 1525 at /tmp/rpmbuild-lustre-build-dm3Xm3GS/BUILD/lustre-2.11.0_1_cray_131_gba03050/ldiskfs/ext4_jbd2.c:277 __ldiskfs_handle_dirty_metadata+0xa7/0x1a0 [ldiskfs]
          [ 5015.995348] Modules linked in: obdecho(OEN) brd ofd(OEN) ost(OEN) osp(OEN) mdd(OEN) lod(OEN) mdt(OEN) lfsck(OEN) mgs(OEN) osd_ldiskfs(OEN) ldiskfs(OEN) lquota(OEN) lustre(OEN) mgc(OEN) lov(OEN) mdc(OEN) osc(OEN) lmv(OEN) fid(OEN) fld(OEN) ptlrpc(OEN) obdclass(OEN) ksocklnd(OEN) lnet(OEN) libcfs(OEN) loop quota_v2 quota_tree af_packet iscsi_ibft iscsi_boot_sysfs vmw_vsock_vmci_transport sb_edac coretemp crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 vsock vmwgfx ttm drm_kms_helper drm vmw_balloon crypto_simd ppdev drm_panel_orientation_quirks syscopyarea sysfillrect sysimgblt fb_sys_fops glue_helper vmxnet3 cryptd joydev shpchp vmw_vmci i2c_piix4 parport_pc pcspkr parport ac button ext4 crc16 jbd2 mbcache sr_mod cdrom ata_generic sd_mod ata_piix ahci libahci
          [ 5015.995423]  serio_raw mptspi scsi_transport_spi libata mptscsih mptbase floppy sg scsi_mod autofs4 [last unloaded: obdecho]
          [ 5015.995434] Supported: No, Unsupported modules are loaded
          [ 5015.995439] CPU: 26 PID: 1525 Comm: mdt03_004 Tainted: G        W  OE      4.12.14-23-default #1 SLE15
          [ 5015.995440] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/30/2013
          [ 5015.995442] task: ffff9ed67ebb8900 task.stack: ffffaf9ed8468000
          [ 5015.995456] RIP: 0010:__ldiskfs_handle_dirty_metadata+0xa7/0x1a0 [ldiskfs]
          [ 5015.995458] RSP: 0018:ffffaf9ed846b770 EFLAGS: 00010286
          [ 5015.995460] RAX: ffff9ed89de62800 RBX: ffff9ed874087f70 RCX: ffff9ed8a642ca40
          [ 5015.995461] RDX: 0000000000000000 RSI: ffff9ed874087f70 RDI: ffff9ed8a64bc210
          [ 5015.995462] RBP: ffff9ed8a64bc210 R08: ffff9ed874087f70 R09: 0000000000000000
          [ 5015.995463] R10: ffff9ed8ac3a4000 R11: 0000000000000040 R12: ffff9ed8a642ca40
          [ 5015.995465] R13: 00000000ffffffe4 R14: ffffffffc10314d0 R15: 00000000000003fb
          [ 5015.995468] FS:  0000000000000000(0000) GS:ffff9ed8bfc80000(0000) knlGS:0000000000000000
          [ 5015.995469] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          [ 5015.995470] CR2: 00007f651bf8e860 CR3: 0000000437a7a000 CR4: 00000000000406e0
          [ 5015.995558] Call Trace:
          [ 5015.995584]  ldiskfs_getblk+0xeb/0x170 [ldiskfs]
          [ 5015.995597]  ldiskfs_bread+0xe/0x80 [ldiskfs]
          [ 5015.995610]  ldiskfs_append+0x5f/0x100 [ldiskfs]
          [ 5015.995623]  ldiskfs_init_new_dir+0xc1/0x1a0 [ldiskfs]
          [ 5015.995634]  ldiskfs_add_dot_dotdot+0x41/0x70 [ldiskfs]
          [ 5015.995686]  osd_add_dot_dotdot_internal.isra.79+0x53/0x70 [osd_ldiskfs]
          [ 5015.995698]  osd_index_ea_insert+0xe81/0xfb0 [osd_ldiskfs]
          [ 5015.995737]  lod_sub_insert+0x168/0x360 [lod]
          [ 5015.995750]  lod_xattr_set_lmv.isra.49.constprop.65+0x65d/0xda0 [lod]
          [ 5015.995763]  lod_dir_striping_create_internal+0x3a8/0x810 [lod]
          [ 5015.995774]  lod_xattr_set+0x27e/0x740 [lod]
          [ 5015.995882]  ? lu_context_refill+0x15/0x50 [obdclass]
          [ 5015.995910]  mdd_create_object+0x3e9/0x950 [mdd]
          [ 5015.995920]  mdd_create+0x11c0/0x1d60 [mdd]
          [ 5015.995984]  ? mdt_version_save+0x66/0x170 [mdt]
          [ 5015.995456] RIP: 0010:__ldiskfs_handle_dirty_metadata+0xa7/0x1a0 [ldiskfs]
          [ 5015.995458] RSP: 0018:ffffaf9ed846b770 EFLAGS: 00010286
          [ 5015.995460] RAX: ffff9ed89de62800 RBX: ffff9ed874087f70 RCX: ffff9ed8a642ca40
          [ 5015.995461] RDX: 0000000000000000 RSI: ffff9ed874087f70 RDI: ffff9ed8a64bc210
          [ 5015.995462] RBP: ffff9ed8a64bc210 R08: ffff9ed874087f70 R09: 0000000000000000
          [ 5015.995463] R10: ffff9ed8ac3a4000 R11: 0000000000000040 R12: ffff9ed8a642ca40
          [ 5015.995465] R13: 00000000ffffffe4 R14: ffffffffc10314d0 R15: 00000000000003fb
          [ 5015.995468] FS:  0000000000000000(0000) GS:ffff9ed8bfc80000(0000) knlGS:0000000000000000
          [ 5015.995469] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          [ 5015.995470] CR2: 00007f651bf8e860 CR3: 0000000437a7a000 CR4: 00000000000406e0
          [ 5015.995558] Call Trace:
          [ 5015.995584]  ldiskfs_getblk+0xeb/0x170 [ldiskfs]
          [ 5015.995597]  ldiskfs_bread+0xe/0x80 [ldiskfs]
          [ 5015.995610]  ldiskfs_append+0x5f/0x100 [ldiskfs]
          [ 5015.995623]  ldiskfs_init_new_dir+0xc1/0x1a0 [ldiskfs]
          [ 5015.995634]  ldiskfs_add_dot_dotdot+0x41/0x70 [ldiskfs]
          [ 5015.995686]  osd_add_dot_dotdot_internal.isra.79+0x53/0x70 [osd_ldiskfs]
          [ 5015.995698]  osd_index_ea_insert+0xe81/0xfb0 [osd_ldiskfs]
          [ 5015.995737]  lod_sub_insert+0x168/0x360 [lod]
          [ 5015.995750]  lod_xattr_set_lmv.isra.49.constprop.65+0x65d/0xda0 [lod]
          [ 5015.995763]  lod_dir_striping_create_internal+0x3a8/0x810 [lod]
          [ 5015.995774]  lod_xattr_set+0x27e/0x740 [lod]
          [ 5015.995882]  ? lu_context_refill+0x15/0x50 [obdclass]
          [ 5015.995910]  mdd_create_object+0x3e9/0x950 [mdd]
          [ 5015.995920]  mdd_create+0x11c0/0x1d60 [mdd]
          [ 5015.995984]  ? mdt_version_save+0x66/0x170 [mdt]
          [ 5015.995999]  mdt_create+0x8cd/0xc50 [mdt]
          [ 5015.996030]  ? lprocfs_job_stats_log+0x2b4/0x680 [obdclass]
          [ 5015.996046]  mdt_reint_create+0x23d/0x340 [mdt]
          [ 5015.996060]  mdt_reint_rec+0x7f/0x250 [mdt]
          [ 5015.996075]  mdt_reint_internal+0x5ee/0x680 [mdt]
          [ 5015.996089]  mdt_reint+0x5e/0x110 [mdt]
          [ 5015.996311]  tgt_request_handle+0x814/0x1500 [ptlrpc]
          [ 5015.996363]  ptlrpc_server_handle_request+0x2de/0xb90 [ptlrpc]
          [ 5015.996411]  ptlrpc_main+0xaf7/0x1f10 [ptlrpc]
          [ 5015.996456]  ? ptlrpc_register_service+0xec0/0xec0 [ptlrpc]
          [ 5015.996464]  kthread+0x11a/0x130
          [ 5015.996467]  ? kthread_create_on_node+0x40/0x40
          [ 5015.996475]  ? do_syscall_64+0x74/0x140
          [ 5015.996481]  ? SyS_exit_group+0x10/0x10
          [ 5015.996489]  ret_from_fork+0x35/0x40
          [ 5015.996492] Code: e5 48 89 de 48 89 ef e8 88 eb 34 ff f6 45 1c 08 41 89 c5 75 d4 48 8b 45 00 48 85 c0 74 cb 48 8b 00 f6 00 02 75 c3 45 85 ed 74 bb <0f> 0b 48 c7 c2 60 24 03 c1 45 89 e8 48 89 e9 44 89 fe 4c 89 f7
          [ 5015.996521] ---[ end trace cbe0954090541ae3 ]---
          [ 5015.996526] LDISKFS-fs: ldiskfs_getblk:1019: aborting transaction: error 28 in __ldiskfs_handle_dirty_metadata
          [ 5015.998649] LDISKFS-fs error (device sdc): ldiskfs_getblk:1019: inode #26034: block 17004: comm mdt03_004: journal_dirty_metadata failed: handle type 0 started at line 1841, credits 512/0, errcode -28
          [ 5016.002471] Aborting journal on device sdc-8. 
          hornc Chris Horn added a comment - - edited I believe I'm seeing this same issue with Lustre 2.11 on a SLES 15 vm: [ 5011.405770] Lustre: DEBUG MARKER: == sanity test 300k: test large striped directory ==================================================== 13:09:56 (1532110196) [ 5011.523914] Lustre: *** cfs_fail_loc=1703, val=0*** [ 5011.529595] Lustre: 1525:0:(osd_internal.h:1168:osd_trans_exec_op()) lustre-MDT0000: opcode 3: before 512 < left 2160, rollback = 4 [ 5011.529604] Lustre: 1525:0:(osd_internal.h:1168:osd_trans_exec_op()) Skipped 71 previous similar messages [ 5011.529613] Lustre: 1525:0:(osd_handler.c:1745:osd_trans_dump_creds()) create: 270/2160/0, destroy: 0/0/0 [ 5011.529617] Lustre: 1525:0:(osd_handler.c:1745:osd_trans_dump_creds()) Skipped 23 previous similar messages [ 5011.529623] Lustre: 1525:0:(osd_handler.c:1752:osd_trans_dump_creds()) attr_set: 1/1/0, xattr_set: 516/49873/0 [ 5011.529627] Lustre: 1525:0:(osd_handler.c:1752:osd_trans_dump_creds()) Skipped 23 previous similar messages [ 5011.529636] Lustre: 1525:0:(osd_handler.c:1762:osd_trans_dump_creds()) write: 53/1648/0, punch: 0/0/0, quota 3/3/0 [ 5011.529640] Lustre: 1525:0:(osd_handler.c:1762:osd_trans_dump_creds()) Skipped 23 previous similar messages [ 5011.529648] Lustre: 1525:0:(osd_handler.c:1769:osd_trans_dump_creds()) insert: 1310/26350/0, delete: 0/0/0 [ 5011.529652] Lustre: 1525:0:(osd_handler.c:1769:osd_trans_dump_creds()) Skipped 23 previous similar messages [ 5011.529659] Lustre: 1525:0:(osd_handler.c:1776:osd_trans_dump_creds()) ref_add: 770/770/0, ref_del: 0/0/0 [ 5011.529664] Lustre: 1525:0:(osd_handler.c:1776:osd_trans_dump_creds()) Skipped 23 previous similar messages [ 5015.565453] Lustre: 1525:0:(osd_internal.h:1168:osd_trans_exec_op()) lustre-MDT0000: opcode 8: before 17 < left 25904, rollback = 9 [ 5015.565459] Lustre: 1525:0:(osd_internal.h:1168:osd_trans_exec_op()) Skipped 2125 previous similar messages [ 5015.995265] ------------[ cut here ]------------ [ 5015.995346] WARNING: CPU: 26 PID: 1525 at /tmp/rpmbuild-lustre-build-dm3Xm3GS/BUILD/lustre-2.11.0_1_cray_131_gba03050/ldiskfs/ext4_jbd2.c:277 __ldiskfs_handle_dirty_metadata+0xa7/0x1a0 [ldiskfs] [ 5015.995348] Modules linked in: obdecho(OEN) brd ofd(OEN) ost(OEN) osp(OEN) mdd(OEN) lod(OEN) mdt(OEN) lfsck(OEN) mgs(OEN) osd_ldiskfs(OEN) ldiskfs(OEN) lquota(OEN) lustre(OEN) mgc(OEN) lov(OEN) mdc(OEN) osc(OEN) lmv(OEN) fid(OEN) fld(OEN) ptlrpc(OEN) obdclass(OEN) ksocklnd(OEN) lnet(OEN) libcfs(OEN) loop quota_v2 quota_tree af_packet iscsi_ibft iscsi_boot_sysfs vmw_vsock_vmci_transport sb_edac coretemp crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 vsock vmwgfx ttm drm_kms_helper drm vmw_balloon crypto_simd ppdev drm_panel_orientation_quirks syscopyarea sysfillrect sysimgblt fb_sys_fops glue_helper vmxnet3 cryptd joydev shpchp vmw_vmci i2c_piix4 parport_pc pcspkr parport ac button ext4 crc16 jbd2 mbcache sr_mod cdrom ata_generic sd_mod ata_piix ahci libahci [ 5015.995423] serio_raw mptspi scsi_transport_spi libata mptscsih mptbase floppy sg scsi_mod autofs4 [last unloaded: obdecho] [ 5015.995434] Supported: No, Unsupported modules are loaded [ 5015.995439] CPU: 26 PID: 1525 Comm: mdt03_004 Tainted: G W OE 4.12.14-23-default #1 SLE15 [ 5015.995440] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/30/2013 [ 5015.995442] task: ffff9ed67ebb8900 task.stack: ffffaf9ed8468000 [ 5015.995456] RIP: 0010:__ldiskfs_handle_dirty_metadata+0xa7/0x1a0 [ldiskfs] [ 5015.995458] RSP: 0018:ffffaf9ed846b770 EFLAGS: 00010286 [ 5015.995460] RAX: ffff9ed89de62800 RBX: ffff9ed874087f70 RCX: ffff9ed8a642ca40 [ 5015.995461] RDX: 0000000000000000 RSI: ffff9ed874087f70 RDI: ffff9ed8a64bc210 [ 5015.995462] RBP: ffff9ed8a64bc210 R08: ffff9ed874087f70 R09: 0000000000000000 [ 5015.995463] R10: ffff9ed8ac3a4000 R11: 0000000000000040 R12: ffff9ed8a642ca40 [ 5015.995465] R13: 00000000ffffffe4 R14: ffffffffc10314d0 R15: 00000000000003fb [ 5015.995468] FS: 0000000000000000(0000) GS:ffff9ed8bfc80000(0000) knlGS:0000000000000000 [ 5015.995469] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5015.995470] CR2: 00007f651bf8e860 CR3: 0000000437a7a000 CR4: 00000000000406e0 [ 5015.995558] Call Trace: [ 5015.995584] ldiskfs_getblk+0xeb/0x170 [ldiskfs] [ 5015.995597] ldiskfs_bread+0xe/0x80 [ldiskfs] [ 5015.995610] ldiskfs_append+0x5f/0x100 [ldiskfs] [ 5015.995623] ldiskfs_init_new_dir+0xc1/0x1a0 [ldiskfs] [ 5015.995634] ldiskfs_add_dot_dotdot+0x41/0x70 [ldiskfs] [ 5015.995686] osd_add_dot_dotdot_internal.isra.79+0x53/0x70 [osd_ldiskfs] [ 5015.995698] osd_index_ea_insert+0xe81/0xfb0 [osd_ldiskfs] [ 5015.995737] lod_sub_insert+0x168/0x360 [lod] [ 5015.995750] lod_xattr_set_lmv.isra.49.constprop.65+0x65d/0xda0 [lod] [ 5015.995763] lod_dir_striping_create_internal+0x3a8/0x810 [lod] [ 5015.995774] lod_xattr_set+0x27e/0x740 [lod] [ 5015.995882] ? lu_context_refill+0x15/0x50 [obdclass] [ 5015.995910] mdd_create_object+0x3e9/0x950 [mdd] [ 5015.995920] mdd_create+0x11c0/0x1d60 [mdd] [ 5015.995984] ? mdt_version_save+0x66/0x170 [mdt] [ 5015.995456] RIP: 0010:__ldiskfs_handle_dirty_metadata+0xa7/0x1a0 [ldiskfs] [ 5015.995458] RSP: 0018:ffffaf9ed846b770 EFLAGS: 00010286 [ 5015.995460] RAX: ffff9ed89de62800 RBX: ffff9ed874087f70 RCX: ffff9ed8a642ca40 [ 5015.995461] RDX: 0000000000000000 RSI: ffff9ed874087f70 RDI: ffff9ed8a64bc210 [ 5015.995462] RBP: ffff9ed8a64bc210 R08: ffff9ed874087f70 R09: 0000000000000000 [ 5015.995463] R10: ffff9ed8ac3a4000 R11: 0000000000000040 R12: ffff9ed8a642ca40 [ 5015.995465] R13: 00000000ffffffe4 R14: ffffffffc10314d0 R15: 00000000000003fb [ 5015.995468] FS: 0000000000000000(0000) GS:ffff9ed8bfc80000(0000) knlGS:0000000000000000 [ 5015.995469] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5015.995470] CR2: 00007f651bf8e860 CR3: 0000000437a7a000 CR4: 00000000000406e0 [ 5015.995558] Call Trace: [ 5015.995584] ldiskfs_getblk+0xeb/0x170 [ldiskfs] [ 5015.995597] ldiskfs_bread+0xe/0x80 [ldiskfs] [ 5015.995610] ldiskfs_append+0x5f/0x100 [ldiskfs] [ 5015.995623] ldiskfs_init_new_dir+0xc1/0x1a0 [ldiskfs] [ 5015.995634] ldiskfs_add_dot_dotdot+0x41/0x70 [ldiskfs] [ 5015.995686] osd_add_dot_dotdot_internal.isra.79+0x53/0x70 [osd_ldiskfs] [ 5015.995698] osd_index_ea_insert+0xe81/0xfb0 [osd_ldiskfs] [ 5015.995737] lod_sub_insert+0x168/0x360 [lod] [ 5015.995750] lod_xattr_set_lmv.isra.49.constprop.65+0x65d/0xda0 [lod] [ 5015.995763] lod_dir_striping_create_internal+0x3a8/0x810 [lod] [ 5015.995774] lod_xattr_set+0x27e/0x740 [lod] [ 5015.995882] ? lu_context_refill+0x15/0x50 [obdclass] [ 5015.995910] mdd_create_object+0x3e9/0x950 [mdd] [ 5015.995920] mdd_create+0x11c0/0x1d60 [mdd] [ 5015.995984] ? mdt_version_save+0x66/0x170 [mdt] [ 5015.995999] mdt_create+0x8cd/0xc50 [mdt] [ 5015.996030] ? lprocfs_job_stats_log+0x2b4/0x680 [obdclass] [ 5015.996046] mdt_reint_create+0x23d/0x340 [mdt] [ 5015.996060] mdt_reint_rec+0x7f/0x250 [mdt] [ 5015.996075] mdt_reint_internal+0x5ee/0x680 [mdt] [ 5015.996089] mdt_reint+0x5e/0x110 [mdt] [ 5015.996311] tgt_request_handle+0x814/0x1500 [ptlrpc] [ 5015.996363] ptlrpc_server_handle_request+0x2de/0xb90 [ptlrpc] [ 5015.996411] ptlrpc_main+0xaf7/0x1f10 [ptlrpc] [ 5015.996456] ? ptlrpc_register_service+0xec0/0xec0 [ptlrpc] [ 5015.996464] kthread+0x11a/0x130 [ 5015.996467] ? kthread_create_on_node+0x40/0x40 [ 5015.996475] ? do_syscall_64+0x74/0x140 [ 5015.996481] ? SyS_exit_group+0x10/0x10 [ 5015.996489] ret_from_fork+0x35/0x40 [ 5015.996492] Code: e5 48 89 de 48 89 ef e8 88 eb 34 ff f6 45 1c 08 41 89 c5 75 d4 48 8b 45 00 48 85 c0 74 cb 48 8b 00 f6 00 02 75 c3 45 85 ed 74 bb <0f> 0b 48 c7 c2 60 24 03 c1 45 89 e8 48 89 e9 44 89 fe 4c 89 f7 [ 5015.996521] ---[ end trace cbe0954090541ae3 ]--- [ 5015.996526] LDISKFS-fs: ldiskfs_getblk:1019: aborting transaction: error 28 in __ldiskfs_handle_dirty_metadata [ 5015.998649] LDISKFS-fs error (device sdc): ldiskfs_getblk:1019: inode #26034: block 17004: comm mdt03_004: journal_dirty_metadata failed: handle type 0 started at line 1841, credits 512/0, errcode -28 [ 5016.002471] Aborting journal on device sdc-8.

          People

            wc-triage WC Triage
            green Oleg Drokin
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: