[LU-8438] sanity test 182 hung Created: 26/Jul/16 Updated: 05/Aug/20 Resolved: 05/Aug/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.9.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Jian Yu | Assignee: | WC Triage |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Console log on MDS showed that: Lustre: DEBUG MARKER: == sanity test 182: Test parallel modify metadata operations ========================================= 20:06:19 (1469502379) BUG: soft lockup - CPU#0 stuck for 22s! [osp-syn-0-0:16414] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_zfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) sha512_generic crypto_null libcfs(OE) dm_mod rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache xprtrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic crct10dif_common ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ppdev virtio_balloon pcspkr ib_core ib_addr parport_pc i2c_piix4 parport zfs(POE) zunicode(POE) zavl(POE) zcommon(POE) znvpair(POE) nfsd spl(OE) zlib_deflate nfs_acl auth_rpcgss lockd grace sunrpc ip_tables ext4 mbcache jbd2 ata_generic pata_acpi cirrus syscopyarea sysfillrect sysimgblt virtio_blk drm_kms_helper ttm ata_piix 8139too libata serio_raw drm virtio_pci virtio_ring[ 0.000000] Initializing cgroup subsys cpuset And the stack backtrace on MDS showed that: jbd2/vda1-8 D ffff880036ba78e0 0 268 2 0x00000000 ffff880036ba7780 0000000000000046 ffff880079360000 ffff880036ba7fd8 ffff880036ba7fd8 ffff880036ba7fd8 ffff880079360000 ffff88007fd147c0 0000000000000000 7fffffffffffffff ffffffff81211940 ffff880036ba78e0 Call Trace: [<ffffffff81211940>] ? generic_block_bmap+0x70/0x70 [<ffffffff8163ba29>] schedule+0x29/0x70 [<ffffffff81639719>] schedule_timeout+0x209/0x2d0 [<ffffffff81058aaf>] ? kvm_clock_get_cycles+0x1f/0x30 [<ffffffff81211940>] ? generic_block_bmap+0x70/0x70 [<ffffffff8163b05e>] io_schedule_timeout+0xae/0x130 [<ffffffff8163b0f8>] io_schedule+0x18/0x20 [<ffffffff8121194e>] sleep_on_buffer+0xe/0x20 [<ffffffff816398a0>] __wait_on_bit+0x60/0x90 [<ffffffff81211940>] ? generic_block_bmap+0x70/0x70 [<ffffffff81639957>] out_of_line_wait_on_bit+0x87/0xb0 [<ffffffff810a6b60>] ? wake_atomic_t_function+0x40/0x40 [<ffffffff81212e10>] ? _submit_bh+0x160/0x210 [<ffffffff81213848>] bh_submit_read+0x78/0x90 [<ffffffffa01c43a7>] ext4_get_branch+0xd7/0x170 [ext4] [<ffffffffa01c4d5e>] ext4_ind_map_blocks+0xce/0x760 [ext4] [<ffffffffa01c6f8c>] ? __es_remove_extent+0x5c/0x300 [ext4] [<ffffffffa0181c1b>] ext4_map_blocks+0x9b/0x590 [ext4] [<ffffffffa01821cc>] _ext4_get_block+0xbc/0x1b0 [ext4] [<ffffffffa01822d6>] ext4_get_block+0x16/0x20 [ext4] [<ffffffff8121191b>] generic_block_bmap+0x4b/0x70 [<ffffffff81212611>] ? alloc_buffer_head+0x21/0x70 [<ffffffffa0181381>] ext4_bmap+0x81/0xf0 [ext4] [<ffffffff811f8c1e>] bmap+0x1e/0x30 [<ffffffffa0169fc8>] jbd2_journal_bmap+0x28/0xa0 [jbd2] [<ffffffffa016a0b2>] jbd2_journal_next_log_block+0x72/0x80 [jbd2] [<ffffffffa0161668>] jbd2_journal_commit_transaction+0x798/0x19a0 [jbd2] [<ffffffff81013588>] ? __switch_to+0xf8/0x4b0 [<ffffffffa0166d79>] kjournald2+0xc9/0x260 [jbd2] [<ffffffff810a6ae0>] ? wake_up_atomic_t+0x30/0x30 [<ffffffffa0166cb0>] ? commit_timeout+0x10/0x10 [jbd2] [<ffffffff810a5aef>] kthread+0xcf/0xe0 [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140 [<ffffffff816469d8>] ret_from_fork+0x58/0x90 [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140 Maloo report: https://testing.hpdd.intel.com/test_sets/bc7f8634-530a-11e6-bf87-5254006e85c2 |
| Comments |
| Comment by Jian Yu [ 26/Jul/16 ] |
|
This is affecting patch review testing on master branch. |
| Comment by Oleg Drokin [ 26/Jul/16 ] |
|
So I see there was a crash and crashump was generated. |
| Comment by Jian Yu [ 26/Jul/16 ] |
|
Sure, Oleg, please see the attached file. The vmcore is under /scratch/dumps/onyx-57vm3.onyx.hpdd.intel.com/10.2.5.84-2016-07-25-20:07:03 on Onyx test cluster. |
| Comment by Jian Yu [ 26/Jul/16 ] |
|
More failure instances on master branch: |
| Comment by Andreas Dilger [ 05/Aug/20 ] |
|
Closing old issue that has not been seen in a long time. |