Details
-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
Lustre 2.11.0
-
3
-
9223372036854775807
Description
I added DNE testing into my testbed and test 300k seems to be having problems on ldiskfs.
It starts like this:
[13013.691307] Lustre: DEBUG MARKER: == sanity test 300k: test large striped directory ==================================================== 18:35:09 (1520206509) [13014.055529] Lustre: *** cfs_fail_loc=1703, val=0*** [13014.098222] Lustre: 24226:0:(osd_internal.h:1151:osd_trans_exec_op()) lustre-MDT0000: opcode 3: before 512 < left 1480, rollback = 4 [13014.099829] Lustre: 24226:0:(osd_internal.h:1151:osd_trans_exec_op()) Skipped 23 previous similar messages [13014.101329] Lustre: 24226:0:(osd_handler.c:1741:osd_trans_dump_creds()) create: 185/1480/0, destroy: 0/0/0 [13014.102702] Lustre: 24226:0:(osd_handler.c:1741:osd_trans_dump_creds()) Skipped 23 previous similar messages [13014.103935] Lustre: 24226:0:(osd_handler.c:1748:osd_trans_dump_creds()) attr_set: 1/1/0, xattr_set: 346/33383/0 [13014.105479] Lustre: 24226:0:(osd_handler.c:1748:osd_trans_dump_creds()) Skipped 23 previous similar messages [13014.106887] Lustre: 24226:0:(osd_handler.c:1758:osd_trans_dump_creds()) write: 53/1648/0, punch: 0/0/0, quota 2/2/0 [13014.107836] Lustre: 24226:0:(osd_handler.c:1758:osd_trans_dump_creds()) Skipped 23 previous similar messages [13014.112440] Lustre: 24226:0:(osd_handler.c:1765:osd_trans_dump_creds()) insert: 1055/23885/0, delete: 0/0/0 [13014.114028] Lustre: 24226:0:(osd_handler.c:1765:osd_trans_dump_creds()) Skipped 23 previous similar messages [13014.115799] Lustre: 24226:0:(osd_handler.c:1772:osd_trans_dump_creds()) ref_add: 685/685/0, ref_del: 0/0/0 [13014.132704] Lustre: 24226:0:(osd_handler.c:1772:osd_trans_dump_creds()) Skipped 23 previous similar messages [13017.644534] ------------[ cut here ]------------ [13017.650823] WARNING: at /home/green/git/lustre-release/ldiskfs/ext4_jbd2.c:266 __ldiskfs_handle_dirty_metadata+0x1d2/0x230 [ldiskfs]() [13017.671187] Modules linked in: brd lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) ext4 loop zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) zlib_deflate mbcache jbd2 syscopyarea sysfillrect sysimgblt ata_generic ttm pata_acpi drm_kms_helper drm i2c_piix4 ata_piix virtio_balloon pcspkr serio_raw virtio_blk i2c_core virtio_console libata floppy nfsd ip_tables rpcsec_gss_krb5 [last unloaded: obdecho] [13017.687393] CPU: 6 PID: 24226 Comm: mdt03_004 Tainted: P OE ------------ 3.10.0-debug #2 [13017.688706] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [13017.689495] 0000000000000000 000000009b5bd8bc ffff880213757438 ffffffff816fd3e4 [13017.691291] ffff880213757470 ffffffff81076250 ffff880076c48f88 ffff880237e11fa0 [13017.692912] ffff8800855c39f8 ffffffffa0b48d9c 0000000000000323 ffff880213757480 [13017.694222] Call Trace: [13017.694832] [<ffffffff816fd3e4>] dump_stack+0x19/0x1b [13017.695505] [<ffffffff81076250>] warn_slowpath_common+0x70/0xb0 [13017.696204] [<ffffffff8107639a>] warn_slowpath_null+0x1a/0x20 [13017.697091] [<ffffffffa0b38192>] __ldiskfs_handle_dirty_metadata+0x1d2/0x230 [ldiskfs] [13017.698752] [<ffffffffa0b17602>] ldiskfs_getblk+0x142/0x210 [ldiskfs] [13017.699645] [<ffffffffa0b176f7>] ldiskfs_bread+0x27/0xe0 [ldiskfs] [13017.700518] [<ffffffffa0b091a1>] ldiskfs_append+0x81/0x150 [ldiskfs] [13017.701366] [<ffffffffa0b1022f>] ldiskfs_init_new_dir+0xcf/0x230 [ldiskfs] [13017.702238] [<ffffffffa0b103de>] ldiskfs_add_dot_dotdot+0x4e/0x90 [ldiskfs] [13017.703247] [<ffffffffa0b8559f>] osd_add_dot_dotdot_internal.isra.73+0x5f/0x80 [osd_ldiskfs] [13017.706196] [<ffffffffa0b9a962>] osd_index_ea_insert+0xa22/0x11c0 [osd_ldiskfs] [13017.707436] [<ffffffffa0dbb1b2>] lod_sub_insert+0x1c2/0x340 [lod] [13017.708022] [<ffffffffa0da0387>] lod_xattr_set_lmv.isra.49.constprop.66+0x647/0xe30 [lod] [13017.709667] [<ffffffffa0dae0cf>] lod_dir_striping_create_internal+0x46f/0x950 [lod] [13017.712608] [<ffffffffa0dafb2f>] lod_xattr_set+0x25f/0x860 [lod] [13017.713517] [<ffffffffa0daf8d0>] ? lod_layout_del+0xab0/0xab0 [lod] [13017.714535] [<ffffffffa0c5e6da>] mdo_xattr_set+0x7a/0x180 [mdd] [13017.715478] [<ffffffffa0c76981>] ? mdd_env_info+0x21/0x60 [mdd] [13017.716396] [<ffffffffa0c649f6>] mdd_create_object+0x416/0x980 [mdd] [13017.717305] [<ffffffffa0c6c383>] mdd_create+0xce3/0x13a0 [mdd] [13017.718352] [<ffffffffa0ce7296>] mdt_create+0x846/0xbb0 [mdt] [13017.719404] [<ffffffffa03d1b89>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [13017.720412] [<ffffffffa0406481>] ? lprocfs_job_stats_log+0xd1/0x600 [obdclass] [13017.726742] [<ffffffffa0ce776b>] mdt_reint_create+0x16b/0x350 [mdt] [13017.727541] [<ffffffffa0ce8cc0>] mdt_reint_rec+0x80/0x210 [mdt] [13017.728262] [<ffffffffa0cc87bb>] mdt_reint_internal+0x5fb/0x990 [mdt] [13017.728967] [<ffffffffa0cd3587>] mdt_reint+0x67/0x140 [mdt] [13017.730200] [<ffffffffa063b28b>] tgt_request_handle+0x93b/0x13e0 [ptlrpc] [13017.730981] [<ffffffffa05e1ce1>] ptlrpc_server_handle_request+0x261/0xaf0 [ptlrpc] [13017.732359] [<ffffffffa05e5a98>] ptlrpc_main+0xa58/0x1df0 [ptlrpc] [13017.733076] [<ffffffffa05e5040>] ? ptlrpc_register_service+0xeb0/0xeb0 [ptlrpc] [13017.748161] [<ffffffff810a2eba>] kthread+0xea/0xf0 [13017.750350] [<ffffffff810a2dd0>] ? kthread_create_on_node+0x140/0x140 [13017.751067] [<ffffffff8170fb98>] ret_from_fork+0x58/0x90 [13017.751743] [<ffffffff810a2dd0>] ? kthread_create_on_node+0x140/0x140 [13017.752450] ---[ end trace 4ba61d7be859e71c ]--- [13017.753099] LDISKFS-fs: ldiskfs_getblk:803: aborting transaction: error 28 in __ldiskfs_handle_dirty_metadata [13017.758373] LDISKFS-fs error (device loop0): ldiskfs_getblk:803: inode #26416: block 17641: comm mdt03_004: journal_dirty_metadata failed: handle type 0 started at line 1837, credits 512/0, errcode -28 [13017.761414] Aborting journal on device loop0-8.
Understandably a lot of errors ensure after that and then we get leaked objects in cache as we try to remove the modules:
[13293.993223] Lustre: 31450:0:(gss_mech_switch.c:80:lgss_mech_unregister()) Unregister krb5 mechanism [13295.637401] LNet: Removed LNI 192.168.123.193@tcp [13296.208435] kmem_cache_destroy ldiskfs_inode_cache: Slab cache still has objects [13296.230451] CPU: 7 PID: 31473 Comm: rmmod Tainted: P W OE ------------ 3.10.0-debug #2 [13296.246837] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [13296.247525] ffff880322083b00 0000000055345a9a ffff880303a6fec0 ffffffff816fd3e4 [13296.248832] ffff880303a6fee0 ffffffff811965cf fffffffffffffff5 ffffffffa0b65b80 [13296.250123] ffff880303a6fef0 ffffffffa0b46cc0 ffff880303a6ff78 ffffffff810f06bb [13296.251433] Call Trace: [13296.252040] [<ffffffff816fd3e4>] dump_stack+0x19/0x1b [13296.252717] [<ffffffff811965cf>] kmem_cache_destroy+0xdf/0xf0 [13296.253481] [<ffffffffa0b46cc0>] ldiskfs_exit_fs+0x3c/0x37c [ldiskfs] [13296.267203] [<ffffffff810f06bb>] SyS_delete_module+0x16b/0x2d0 [13296.267942] [<ffffffff8170fc49>] system_call_fastpath+0x16/0x1b [13296.375897] kmem_cache_destroy ldiskfs_extent_status: Slab cache still has objects [13296.386704] CPU: 7 PID: 31473 Comm: rmmod Tainted: P W OE ------------ 3.10.0-debug #2 [13296.388371] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [13296.389248] ffff88030d64d6c0 0000000055345a9a ffff880303a6feb0 ffffffff816fd3e4 [13296.390923] ffff880303a6fed0 ffffffff811965cf fffffffffffffff5 ffffffffa0b65b80 [13296.392558] ffff880303a6fee0 ffffffffa0b423ca ffff880303a6fef0 ffffffffa0b46cf8 [13296.394264] Call Trace: [13296.397125] [<ffffffff816fd3e4>] dump_stack+0x19/0x1b [13296.398053] [<ffffffff811965cf>] kmem_cache_destroy+0xdf/0xf0 [13296.399147] [<ffffffffa0b423ca>] ldiskfs_exit_es+0x1a/0x20 [ldiskfs] [13296.400204] [<ffffffffa0b46cf8>] ldiskfs_exit_fs+0x74/0x37c [ldiskfs] [13296.401109] [<ffffffff810f06bb>] SyS_delete_module+0x16b/0x2d0 [13296.402219] [<ffffffff8170fc49>] system_call_fastpath+0x16/0x1b