Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10762

sanity test 300k transaction accounting problem

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.11.0
    • 3
    • 9223372036854775807

    Description

      I added DNE testing into my testbed and test 300k seems to be having problems on ldiskfs.

      It starts like this:

      [13013.691307] Lustre: DEBUG MARKER: == sanity test 300k: test large striped directory ==================================================== 18:35:09 (1520206509)
      [13014.055529] Lustre: *** cfs_fail_loc=1703, val=0***
      [13014.098222] Lustre: 24226:0:(osd_internal.h:1151:osd_trans_exec_op()) lustre-MDT0000: opcode 3: before 512 < left 1480, rollback = 4
      [13014.099829] Lustre: 24226:0:(osd_internal.h:1151:osd_trans_exec_op()) Skipped 23 previous similar messages
      [13014.101329] Lustre: 24226:0:(osd_handler.c:1741:osd_trans_dump_creds())   create: 185/1480/0, destroy: 0/0/0
      [13014.102702] Lustre: 24226:0:(osd_handler.c:1741:osd_trans_dump_creds()) Skipped 23 previous similar messages
      [13014.103935] Lustre: 24226:0:(osd_handler.c:1748:osd_trans_dump_creds())   attr_set: 1/1/0, xattr_set: 346/33383/0
      [13014.105479] Lustre: 24226:0:(osd_handler.c:1748:osd_trans_dump_creds()) Skipped 23 previous similar messages
      [13014.106887] Lustre: 24226:0:(osd_handler.c:1758:osd_trans_dump_creds())   write: 53/1648/0, punch: 0/0/0, quota 2/2/0
      [13014.107836] Lustre: 24226:0:(osd_handler.c:1758:osd_trans_dump_creds()) Skipped 23 previous similar messages
      [13014.112440] Lustre: 24226:0:(osd_handler.c:1765:osd_trans_dump_creds())   insert: 1055/23885/0, delete: 0/0/0
      [13014.114028] Lustre: 24226:0:(osd_handler.c:1765:osd_trans_dump_creds()) Skipped 23 previous similar messages
      [13014.115799] Lustre: 24226:0:(osd_handler.c:1772:osd_trans_dump_creds())   ref_add: 685/685/0, ref_del: 0/0/0
      [13014.132704] Lustre: 24226:0:(osd_handler.c:1772:osd_trans_dump_creds()) Skipped 23 previous similar messages
      [13017.644534] ------------[ cut here ]------------
      [13017.650823] WARNING: at /home/green/git/lustre-release/ldiskfs/ext4_jbd2.c:266 __ldiskfs_handle_dirty_metadata+0x1d2/0x230 [ldiskfs]()
      [13017.671187] Modules linked in: brd lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) ext4 loop zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) zlib_deflate mbcache jbd2 syscopyarea sysfillrect sysimgblt ata_generic ttm pata_acpi drm_kms_helper drm i2c_piix4 ata_piix virtio_balloon pcspkr serio_raw virtio_blk i2c_core virtio_console libata floppy nfsd ip_tables rpcsec_gss_krb5 [last unloaded: obdecho]
      [13017.687393] CPU: 6 PID: 24226 Comm: mdt03_004 Tainted: P           OE  ------------   3.10.0-debug #2
      [13017.688706] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [13017.689495]  0000000000000000 000000009b5bd8bc ffff880213757438 ffffffff816fd3e4
      [13017.691291]  ffff880213757470 ffffffff81076250 ffff880076c48f88 ffff880237e11fa0
      [13017.692912]  ffff8800855c39f8 ffffffffa0b48d9c 0000000000000323 ffff880213757480
      [13017.694222] Call Trace:
      [13017.694832]  [<ffffffff816fd3e4>] dump_stack+0x19/0x1b
      [13017.695505]  [<ffffffff81076250>] warn_slowpath_common+0x70/0xb0
      [13017.696204]  [<ffffffff8107639a>] warn_slowpath_null+0x1a/0x20
      [13017.697091]  [<ffffffffa0b38192>] __ldiskfs_handle_dirty_metadata+0x1d2/0x230 [ldiskfs]
      [13017.698752]  [<ffffffffa0b17602>] ldiskfs_getblk+0x142/0x210 [ldiskfs]
      [13017.699645]  [<ffffffffa0b176f7>] ldiskfs_bread+0x27/0xe0 [ldiskfs]
      [13017.700518]  [<ffffffffa0b091a1>] ldiskfs_append+0x81/0x150 [ldiskfs]
      [13017.701366]  [<ffffffffa0b1022f>] ldiskfs_init_new_dir+0xcf/0x230 [ldiskfs]
      [13017.702238]  [<ffffffffa0b103de>] ldiskfs_add_dot_dotdot+0x4e/0x90 [ldiskfs]
      [13017.703247]  [<ffffffffa0b8559f>] osd_add_dot_dotdot_internal.isra.73+0x5f/0x80 [osd_ldiskfs]
      [13017.706196]  [<ffffffffa0b9a962>] osd_index_ea_insert+0xa22/0x11c0 [osd_ldiskfs]
      [13017.707436]  [<ffffffffa0dbb1b2>] lod_sub_insert+0x1c2/0x340 [lod]
      [13017.708022]  [<ffffffffa0da0387>] lod_xattr_set_lmv.isra.49.constprop.66+0x647/0xe30 [lod]
      [13017.709667]  [<ffffffffa0dae0cf>] lod_dir_striping_create_internal+0x46f/0x950 [lod]
      [13017.712608]  [<ffffffffa0dafb2f>] lod_xattr_set+0x25f/0x860 [lod]
      [13017.713517]  [<ffffffffa0daf8d0>] ? lod_layout_del+0xab0/0xab0 [lod]
      [13017.714535]  [<ffffffffa0c5e6da>] mdo_xattr_set+0x7a/0x180 [mdd]
      [13017.715478]  [<ffffffffa0c76981>] ? mdd_env_info+0x21/0x60 [mdd]
      [13017.716396]  [<ffffffffa0c649f6>] mdd_create_object+0x416/0x980 [mdd]
      [13017.717305]  [<ffffffffa0c6c383>] mdd_create+0xce3/0x13a0 [mdd]
      [13017.718352]  [<ffffffffa0ce7296>] mdt_create+0x846/0xbb0 [mdt]
      [13017.719404]  [<ffffffffa03d1b89>] ? lprocfs_counter_add+0xf9/0x160 [obdclass]
      [13017.720412]  [<ffffffffa0406481>] ? lprocfs_job_stats_log+0xd1/0x600 [obdclass]
      [13017.726742]  [<ffffffffa0ce776b>] mdt_reint_create+0x16b/0x350 [mdt]
      [13017.727541]  [<ffffffffa0ce8cc0>] mdt_reint_rec+0x80/0x210 [mdt]
      [13017.728262]  [<ffffffffa0cc87bb>] mdt_reint_internal+0x5fb/0x990 [mdt]
      [13017.728967]  [<ffffffffa0cd3587>] mdt_reint+0x67/0x140 [mdt]
      [13017.730200]  [<ffffffffa063b28b>] tgt_request_handle+0x93b/0x13e0 [ptlrpc]
      [13017.730981]  [<ffffffffa05e1ce1>] ptlrpc_server_handle_request+0x261/0xaf0 [ptlrpc]
      [13017.732359]  [<ffffffffa05e5a98>] ptlrpc_main+0xa58/0x1df0 [ptlrpc]
      [13017.733076]  [<ffffffffa05e5040>] ? ptlrpc_register_service+0xeb0/0xeb0 [ptlrpc]
      [13017.748161]  [<ffffffff810a2eba>] kthread+0xea/0xf0
      [13017.750350]  [<ffffffff810a2dd0>] ? kthread_create_on_node+0x140/0x140
      [13017.751067]  [<ffffffff8170fb98>] ret_from_fork+0x58/0x90
      [13017.751743]  [<ffffffff810a2dd0>] ? kthread_create_on_node+0x140/0x140
      [13017.752450] ---[ end trace 4ba61d7be859e71c ]---
      [13017.753099] LDISKFS-fs: ldiskfs_getblk:803: aborting transaction: error 28 in __ldiskfs_handle_dirty_metadata
      [13017.758373] LDISKFS-fs error (device loop0): ldiskfs_getblk:803: inode #26416: block 17641: comm mdt03_004: journal_dirty_metadata failed: handle type 0 started at line 1837, credits 512/0, errcode -28
      [13017.761414] Aborting journal on device loop0-8.
      

      Understandably a lot of errors ensure after that and then we get leaked objects in cache as we try to remove the modules:

      [13293.993223] Lustre: 31450:0:(gss_mech_switch.c:80:lgss_mech_unregister()) Unregister krb5 mechanism
      [13295.637401] LNet: Removed LNI 192.168.123.193@tcp
      [13296.208435] kmem_cache_destroy ldiskfs_inode_cache: Slab cache still has objects
      [13296.230451] CPU: 7 PID: 31473 Comm: rmmod Tainted: P        W  OE  ------------   3.10.0-debug #2
      [13296.246837] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [13296.247525]  ffff880322083b00 0000000055345a9a ffff880303a6fec0 ffffffff816fd3e4
      [13296.248832]  ffff880303a6fee0 ffffffff811965cf fffffffffffffff5 ffffffffa0b65b80
      [13296.250123]  ffff880303a6fef0 ffffffffa0b46cc0 ffff880303a6ff78 ffffffff810f06bb
      [13296.251433] Call Trace:
      [13296.252040]  [<ffffffff816fd3e4>] dump_stack+0x19/0x1b
      [13296.252717]  [<ffffffff811965cf>] kmem_cache_destroy+0xdf/0xf0
      [13296.253481]  [<ffffffffa0b46cc0>] ldiskfs_exit_fs+0x3c/0x37c [ldiskfs]
      [13296.267203]  [<ffffffff810f06bb>] SyS_delete_module+0x16b/0x2d0
      [13296.267942]  [<ffffffff8170fc49>] system_call_fastpath+0x16/0x1b
      [13296.375897] kmem_cache_destroy ldiskfs_extent_status: Slab cache still has objects
      [13296.386704] CPU: 7 PID: 31473 Comm: rmmod Tainted: P        W  OE  ------------   3.10.0-debug #2
      [13296.388371] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [13296.389248]  ffff88030d64d6c0 0000000055345a9a ffff880303a6feb0 ffffffff816fd3e4
      [13296.390923]  ffff880303a6fed0 ffffffff811965cf fffffffffffffff5 ffffffffa0b65b80
      [13296.392558]  ffff880303a6fee0 ffffffffa0b423ca ffff880303a6fef0 ffffffffa0b46cf8
      [13296.394264] Call Trace:
      [13296.397125]  [<ffffffff816fd3e4>] dump_stack+0x19/0x1b
      [13296.398053]  [<ffffffff811965cf>] kmem_cache_destroy+0xdf/0xf0
      [13296.399147]  [<ffffffffa0b423ca>] ldiskfs_exit_es+0x1a/0x20 [ldiskfs]
      [13296.400204]  [<ffffffffa0b46cf8>] ldiskfs_exit_fs+0x74/0x37c [ldiskfs]
      [13296.401109]  [<ffffffff810f06bb>] SyS_delete_module+0x16b/0x2d0
      [13296.402219]  [<ffffffff8170fc49>] system_call_fastpath+0x16/0x1b
      

      Attachments

        Activity

          People

            wc-triage WC Triage
            green Oleg Drokin
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: