Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.4.0, Lustre 2.1.4
-
None
-
Lustre Tag: v2_1_4_RC1
Lustre Build: http://build.whamcloud.com/job/lustre-b2_1/159/
Distro/Arch: RHEL6.3/x86_64 (kernel version: 2.6.32_279.14.1.el6)
-
3
-
5875
Description
lfsck test hung as follows:
<~snip~> creating /mnt/lustre/d0.lfsck/testfile.30 creating files in /mnt/lustre/d0.lfsck/d9 creating files in /mnt/lustre/d0.lfsck/d10 creating files in /mnt/lustre/d0.lfsck/d11 linking files in /mnt/lustre/d0.lfsck/d11 linking files in /mnt/lustre/d0.lfsck/d11 linking files in /mnt/lustre/d0.lfsck/d11 linking files in /mnt/lustre/d0.lfsck/d11 linking files in /mnt/lustre/d0.lfsck/d11 linking files in /mnt/lustre/d0.lfsck/d11 linking files in /mnt/lustre/d0.lfsck/d11 linking files in /mnt/lustre/d0.lfsck/d11 linking files in /mnt/lustre/d0.lfsck/d11 linking files in /mnt/lustre/d0.lfsck/d11 CMD: client-26vm8 lctl get_param -n obdfilter.*.uuid CMD: client-26vm8 lctl get_param -n obdfilter.lustre-OST0000.mntdev Stopping clients: client-26vm1,client-26vm2.lab.whamcloud.com /mnt/lustre (opts:-f)
Console log on client-26vm2 showed that:
Lustre: 4267:0:(llite_lib.c:2290:ll_dirty_page_discard_warn()) dirty page discard: 10.10.4.154@tcp:/lustre/fid: [0x200002b13:0x1224:0x0]/ may get corrupted (rc -108) Lustre: 4267:0:(llite_lib.c:2290:ll_dirty_page_discard_warn()) dirty page discard: 10.10.4.154@tcp:/lustre/fid: [0x200002b13:0x1237:0x0]/ may get corrupted (rc -108) Lustre: 22194:0:(cl_lock.c:2026:cl_lock_page_out()) Writing 1 pages error: -108 BUG: soft lockup - CPU#0 stuck for 67s! [umount:22194] Modules linked in: lustre(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) lquota(U) ptlrpc(U) obdclass(U) lvfs(U) ksocklnd(U) lnet(U) libcfs(U) ext2 nfs fscache nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs] CPU 0 Modules linked in: lustre(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) lquota(U) ptlrpc(U) obdclass(U) lvfs(U) ksocklnd(U) lnet(U) libcfs(U) ext2 nfs fscache nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs] Pid: 22194, comm: umount Not tainted 2.6.32-279.14.1.el6.x86_64 #1 Red Hat KVM RIP: 0010:[<ffffffff8150098e>] [<ffffffff8150098e>] _spin_lock+0x1e/0x30 RSP: 0018:ffff88007b7ef508 EFLAGS: 00000206 RAX: 0000000000000001 RBX: ffff88007b7ef508 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff88007ae2c400 RDI: ffff880076b3a734 RBP: ffffffff8100bc0e R08: 0000000000000001 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000001 R12: ffff8800760154e0 R13: ffff8800760154e0 R14: ffff880024017cf0 R15: ffff88007b7ef488 FS: 00007f7464baa740(0000) GS:ffff880002200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007f74642c81b0 CR3: 0000000078444000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process umount (pid: 22194, threadinfo ffff88007b7ee000, task ffff88007d059500) Stack: ffff88007b7ef5a8 ffffffffa08ee29c ffff8800754f3d98 ffff880073c08400 <d> ffff88007ae2c400 ffff880078d76c38 ffff88007ae2c510 ffff88007ae2c4f0 <d> ffff880076b3a648 0000000100000001 ffff88007ae2c4f8 ffff88007d0449e8 Call Trace: [<ffffffffa08ee29c>] ? osc_io_submit+0x31c/0x540 [osc] [<ffffffffa081c148>] ? cl_io_submit_rw+0x78/0x180 [obdclass] [<ffffffffa0958ad5>] ? lov_io_submit+0x355/0xc30 [lov] [<ffffffffa0813c65>] ? cl_page_put+0x1e5/0x580 [obdclass] [<ffffffffa081680a>] ? pgoff_at_lock+0x1a/0x70 [obdclass] [<ffffffffa081c148>] ? cl_io_submit_rw+0x78/0x180 [obdclass] [<ffffffffa081f29f>] ? cl_io_submit_sync+0x9f/0x210 [obdclass] [<ffffffffa08166c7>] ? cl_lock_page_out+0x207/0x330 [obdclass] [<ffffffffa0816860>] ? pageout_cb+0x0/0x100 [obdclass] [<ffffffffa08ea99f>] ? osc_lock_flush+0x4f/0x90 [osc] [<ffffffffa08eaaaf>] ? osc_lock_cancel+0xcf/0x1b0 [osc] [<ffffffffa08152a5>] ? cl_lock_cancel0+0x75/0x160 [obdclass] [<ffffffffa0815f8b>] ? cl_lock_cancel+0x13b/0x140 [obdclass] [<ffffffffa08ebb8a>] ? osc_ldlm_blocking_ast+0x13a/0x380 [osc] [<ffffffffa0420cc0>] ? ldlm_cancel_callback+0x60/0x100 [ptlrpc] [<ffffffffa043b14b>] ? ldlm_cli_cancel_local+0x7b/0x380 [ptlrpc] [<ffffffffa043efd8>] ? ldlm_cli_cancel+0x58/0x3a0 [ptlrpc] [<ffffffffa0428db8>] ? cleanup_resource+0x168/0x300 [ptlrpc] [<ffffffffa06baefe>] ? cfs_hash_spin_lock+0xe/0x10 [libcfs] [<ffffffffa0428f7a>] ? ldlm_resource_clean+0x2a/0x50 [ptlrpc] [<ffffffffa06bb18f>] ? cfs_hash_for_each_relax+0x17f/0x380 [libcfs] [<ffffffffa0428f50>] ? ldlm_resource_clean+0x0/0x50 [ptlrpc] [<ffffffffa0428f50>] ? ldlm_resource_clean+0x0/0x50 [ptlrpc] [<ffffffffa06bcbaf>] ? cfs_hash_for_each_nolock+0x7f/0x1c0 [libcfs] [<ffffffffa0425a49>] ? ldlm_namespace_cleanup+0x29/0xb0 [ptlrpc] [<ffffffffa08e2122>] ? osc_import_event+0x3d2/0x1470 [osc] [<ffffffffa0487c59>] ? ptlrpc_invalidate_import+0x2a9/0x8d0 [ptlrpc] [<ffffffffa08e283c>] ? osc_import_event+0xaec/0x1470 [osc] [<ffffffffa045f53d>] ? ptlrpc_set_import_active+0x33d/0x480 [ptlrpc] [<ffffffffa08dd070>] ? osc_iocontrol+0xc80/0x11b0 [osc] [<ffffffff8127cf26>] ? vsnprintf+0x2b6/0x5f0 [<ffffffffa092f7ff>] ? lov_iocontrol+0x27f/0x1e90 [lov] [<ffffffffa064eadf>] ? lmv_iocontrol+0x25f/0x1cf0 [lmv] [<ffffffff81188961>] ? path_put+0x31/0x40 [<ffffffff81188325>] ? putname+0x35/0x50 [<ffffffffa09dd640>] ? ll_umount_begin+0x2c0/0xc30 [lustre] [<ffffffff8119b52a>] ? sys_umount+0x11a/0x3a0 [<ffffffff810d6d42>] ? audit_syscall_entry+0x272/0x2a0 [<ffffffff81503ade>] ? do_page_fault+0x3e/0xa0 [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
The full console log is attached.
Maloo report: https://maloo.whamcloud.com/test_sets/679bdff0-4828-11e2-a406-52540035b04c