Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
Lustre 2.8.0
-
None
-
el7.1
-
2
-
9223372036854775807
Description
New bug seen only in autotest for el7.1 server code so far. Not seen in previous test runs on el7.0. After the failure the lustre fs becomes read-only, leading to many more failures.
Yang Sheng reports:
[4/4/15, 8:46:51 PM] yang sheng: I also encountered such error in my test environment. I found it caused by journal space not eough to handle dirty data. So modify MDS_FS_MKFS_OPTS='-J size=xxx' would works well.
[4/4/15, 8:57:22 PM] yang sheng: I'll doing more investigate to reveal root cause.
Seen in sanity test_102i: https://testing.hpdd.intel.com/test_sets/b849c698-da5b-11e4-8289-5254006e85c2
[ 2666.355166] -----------[ cut here ]-----------
[ 2666.356994] WARNING: at /var/lib/jenkins/tmp/lustre_el7_topdir/BUILD/BUILD/lustre-2.7.51/ldiskfs/ext4_jbd2.c:260 __ldiskfs_handle_dirty_metadata+0x1c2/0x220 [ldiskfs]()
[ 2666.361168] Modules linked in: osp(OF) mdd(OF) lod(OF) mdt(OF) lfsck(OF) mgs(OF) mgc(OF) osd_ldiskfs(OF) lquota(OF) fid(OF) fld(OF) ksocklnd(OF) ptlrpc(OF) obdclass(OF) lnet(OF) sha512_generic libcfs(OF) ldiskfs(OF) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd fscache xprtrdma sunrpc ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ppdev ib_cm parport_pc iw_cm parport ib_sa ib_mad virtio_balloon pcspkr i2c_piix4 ib_core serio_raw ib_addr ext4 mbcache jbd2 ata_generic pata_acpi virtio_blk cirrus syscopyarea sysfillrect sysimgblt drm_kms_helper ttm 8139too virtio_pci virtio_ring virtio ata_piix floppy drm i2c_core libata 8139cp mii [last unloaded: llog_test]
[ 2666.378809] CPU: 0 PID: 11066 Comm: mdt00_002 Tainted: GF W O-------------- 3.10.0-229.1.2.el7_lustre.g5f2eb1d.x86_64 #1
[ 2666.382889] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
[ 2666.384948] 0000000000000000 00000000e532c5a7 ffff880077c677f8 ffffffff81604d2a
[ 2666.387191] ffff880077c67830 ffffffff8106e34b ffff880035b8bdd0 ffff88007cc414b0
[ 2666.389386] 0000000000000000 ffffffffa05ba540 00000000000013f2 ffff880077c67840
[ 2666.391661] Call Trace:
[ 2666.393487] [] dump_stack+0x19/0x1b
[ 2666.395480] [] warn_slowpath_common+0x6b/0xb0
[ 2666.397550] [] warn_slowpath_null+0x1a/0x20
[ 2666.399640] [] __ldiskfs_handle_dirty_metadata+0x1c2/0x220 [ldiskfs]
[ 2666.401911] [] ? ldiskfs_dirty_inode+0x54/0x60 [ldiskfs]
[ 2666.404128] [] ldiskfs_free_blocks+0x5e6/0xb90 [ldiskfs]
[ 2666.406246] [] ldiskfs_xattr_release_block+0x275/0x330 [ldiskfs]
[ 2666.408443] [] ldiskfs_xattr_delete_inode+0x2bb/0x300 [ldiskfs]
[ 2666.410567] [] ldiskfs_evict_inode+0x1b5/0x610 [ldiskfs]
[ 2666.412594] [] evict+0xa7/0x170
[ 2666.414443] [] iput+0xf5/0x180
[ 2666.416275] [] osd_object_delete+0x1d3/0x300 [osd_ldiskfs]
[ 2666.418308] [] lu_object_free.isra.30+0x9d/0x1a0 [obdclass]
[ 2666.420350] [] lu_object_put+0xc2/0x320 [obdclass]
[ 2666.422389] [] mdt_reint_unlink+0x796/0x1150 [mdt]
[ 2666.424396] [] mdt_reint_rec+0x80/0x210 [mdt]
[ 2666.426508] [] mdt_reint_internal+0x58c/0x780 [mdt]
[ 2666.428542] [] mdt_reint+0x67/0x140 [mdt]
[ 2666.430616] [] tgt_request_handle+0x635/0xfd0 [ptlrpc]
[ 2666.432746] [] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc]
[ 2666.434929] [] ? ptlrpc_wait_event+0x98/0x330 [ptlrpc]
[ 2666.436900] [] ? libcfs_debug_msg+0x57/0x80 [libcfs]
[ 2666.438920] [] ptlrpc_main+0xaf8/0x1ea0 [ptlrpc]
[ 2666.440866] [] ? __dequeue_entity+0x26/0x40
[ 2666.442788] [] ? ptlrpc_register_service+0xf00/0xf00 [ptlrpc]
[ 2666.444755] [] kthread+0xcf/0xe0
[ 2666.446590] [] ? kthread_create_on_node+0x140/0x140
[ 2666.448452] [] ret_from_fork+0x7c/0xb0
[ 2666.450308] [] ? kthread_create_on_node+0x140/0x140
[ 2666.452177] --[ end trace 53ab1a0dad30f568 ]--
[ 2666.453923] LDISKFS-fs: ldiskfs_free_blocks:5106: aborting transaction: error 28 in __ldiskfs_handle_dirty_metadata
[ 2666.456082] LDISKFS: jbd2_journal_dirty_metadata failed: handle type 5 started at line 240, credits 3/0, errcode -28
[ 2666.456945] LDISKFS-fs error (device dm-0) in ldiskfs_free_blocks:5118: error 28
[ 2666.469566] Aborting journal on device dm-0-8.
[ 2666.516889] LDISKFS-fs (dm-0): Remounting filesystem read-only
Attachments
Issue Links
- duplicates
-
LU-6722 sanity-lfsck test_1a: FAIL: (3) Fail to start LFSCK for namespace!
- Resolved