[LU-6699] LustreError: 7605:0:(osd_handler.c:2530:osd_object_destroy()) ASSERTION Created: 09/Jun/15 Updated: 22/Jul/18 Resolved: 22/Jul/18 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0, Lustre 2.10.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Dave Bond (Inactive) | Assignee: | Mikhail Pershin |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Environment: |
RHEL6 |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
We have just upgraded our servers to 2.7. This has caused one of the MDS to assert. Message from syslogd@cs04r-sc-mds03-02 at Jun 9 16:56:29 ... Could you advise a suitable course of action |
| Comments |
| Comment by Andreas Dilger [ 09/Jun/15 ] |
|
Could you please provide the rest of the stack trace below "lbug_with_loc". |
| Comment by Peter Jones [ 09/Jun/15 ] |
|
Alex Could you please advise? Thanks Peter |
| Comment by Dave Bond (Inactive) [ 10/Jun/15 ] |
|
Preceding messages: Jun 9 11:28:23 cs04r-sc-mds03-02 kernel: LustreError: 8077:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2 Jun 9 11:28:23 cs04r-sc-mds03-02 kernel: LustreError: 8077:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 52990 of catalog 0x8:10 rc=-2 Jun 9 11:34:22 cs04r-sc-mds03-02 kernel: swapper: page allocation failure. order:2, mode:0x20 Jun 9 11:34:22 cs04r-sc-mds03-02 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-504.8.1.el6_lustre.x86_64 #1 Jun 9 11:34:22 cs04r-sc-mds03-02 kernel: Call Trace: Jun 9 11:34:22 cs04r-sc-mds03-02 kernel: <IRQ> [<ffffffff811347ba>] ? __alloc_pages_nodemask+0x74a/0x8d0 Jun 9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffff811736e2>] ? kmem_getpages+0x62/0x170 Jun 9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffff811742fa>] ? fallback_alloc+0x1ba/0x270 Jun 9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffff81173d4f>] ? cache_grow+0x2cf/0x320 Jun 9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffff81174079>] ? ____cache_alloc_node+0x99/0x160 Jun 9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x] Jun 9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffff81174cc9>] ? __kmalloc+0x199/0x230 Jun 9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x] Jun 9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffffa0314277>] ? bnx2x_alloc_rx_data+0x47/0x1d0 [bnx2x] Jun 9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffff812a2c88>] ? swiotlb_sync_single+0x28/0xd0 Jun 9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffffa0314ea9>] ? bnx2x_rx_int+0xaa9/0x1670 [bnx2x] Jun 9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffff81450a6a>] ? skb_release_head_state+0x6a/0x110 Jun 9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffff8145086e>] ? __kfree_skb+0x1e/0xa0 Jun 9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffffa031250c>] ? bnx2x_free_tx_pkt+0x1cc/0x2e0 [bnx2x] Jun 9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffffa0310182>] ? bnx2x_drain_tx_queues+0xd2/0x140 [bnx2x] Jun 9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffff8145a2c3>] ? __napi_complete+0x23/0x40 Jun 9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffffa0315c8f>] ? bnx2x_poll+0x10f/0x400 [bnx2x] Jun 9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffff81462a23>] ? net_rx_action+0x103/0x2f0 Jun 9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffff8107d8b1>] ? __do_softirq+0xc1/0x1e0 Jun 9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffff810eaec0>] ? handle_IRQ_event+0x60/0x170 Jun 9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30 Jun 9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffff8100fb55>] ? do_softirq+0x65/0xa0 Jun 9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffff8107d765>] ? irq_exit+0x85/0x90 Jun 9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffff81534405>] ? do_IRQ+0x75/0xf0 Jun 9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffff8100b9d3>] ? ret_from_intr+0x0/0x11 Jun 9 11:34:22 cs04r-sc-mds03-02 kernel: <EOI> [<ffffffff812eaf5e>] ? intel_idle+0xde/0x170 Jun 9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffff812eaf41>] ? intel_idle+0xc1/0x170 Jun 9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffff81426517>] ? cpuidle_idle_call+0xa7/0x140 Jun 9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110 Jun 9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffff815236e7>] ? start_secondary+0x2be/0x301 Jun 9 11:35:26 cs04r-sc-mds03-02 kernel: LustreError: 7140:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2 Jun 9 11:35:26 cs04r-sc-mds03-02 kernel: LustreError: 7140:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53002 of catalog 0x8:10 rc=-2 Jun 9 11:36:24 cs04r-sc-mds03-02 kernel: LustreError: 8124:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2 Jun 9 11:36:24 cs04r-sc-mds03-02 kernel: LustreError: 8124:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53004 of catalog 0x8:10 rc=-2 Jun 9 11:57:34 cs04r-sc-mds03-02 kernel: LustreError: 8149:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2 Jun 9 11:57:34 cs04r-sc-mds03-02 kernel: LustreError: 8149:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53028 of catalog 0x8:10 rc=-2 Jun 9 12:06:33 cs04r-sc-mds03-02 kernel: LustreError: 7605:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2 Jun 9 12:06:33 cs04r-sc-mds03-02 kernel: LustreError: 7605:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53035 of catalog 0x8:10 rc=-2 Jun 9 12:21:21 cs04r-sc-mds03-02 kernel: LustreError: 8118:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2 Jun 9 12:21:21 cs04r-sc-mds03-02 kernel: LustreError: 8118:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53047 of catalog 0x8:10 rc=-2 Jun 9 12:26:42 cs04r-sc-mds03-02 kernel: LustreError: 8086:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2 Jun 9 12:26:42 cs04r-sc-mds03-02 kernel: LustreError: 8086:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53051 of catalog 0x8:10 rc=-2 Jun 9 13:12:41 cs04r-sc-mds03-02 kernel: LustreError: 8124:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2 Jun 9 13:12:41 cs04r-sc-mds03-02 kernel: LustreError: 8124:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53090 of catalog 0x8:10 rc=-2 Jun 9 13:48:26 cs04r-sc-mds03-02 kernel: LustreError: 7408:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2 Jun 9 13:48:26 cs04r-sc-mds03-02 kernel: LustreError: 7408:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53117 of catalog 0x8:10 rc=-2 Jun 9 13:53:05 cs04r-sc-mds03-02 kernel: LustreError: 7433:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2 Jun 9 13:53:05 cs04r-sc-mds03-02 kernel: LustreError: 7433:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53120 of catalog 0x8:10 rc=-2 Jun 9 13:58:31 cs04r-sc-mds03-02 kernel: LustreError: 7176:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2 Jun 9 13:58:31 cs04r-sc-mds03-02 kernel: LustreError: 7176:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53124 of catalog 0x8:10 rc=-2 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: swapper: page allocation failure. order:2, mode:0x20 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: kswapd0: page allocation failure. order:2, mode:0x20 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: Pid: 340, comm: kswapd0 Not tainted 2.6.32-504.8.1.el6_lustre.x86_64 #1 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: Call Trace: Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: <IRQ> [<ffffffff811347ba>] ? __alloc_pages_nodemask+0x74a/0x8d0 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff811736e2>] ? kmem_getpages+0x62/0x170 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff811742fa>] ? fallback_alloc+0x1ba/0x270 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81173d4f>] ? cache_grow+0x2cf/0x320 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81174079>] ? ____cache_alloc_node+0x99/0x160 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x] Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81174cc9>] ? __kmalloc+0x199/0x230 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x] Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa0314277>] ? bnx2x_alloc_rx_data+0x47/0x1d0 [bnx2x] Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff812a2c88>] ? swiotlb_sync_single+0x28/0xd0 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa0314ea9>] ? bnx2x_rx_int+0xaa9/0x1670 [bnx2x] Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa03153c7>] ? bnx2x_rx_int+0xfc7/0x1670 [bnx2x] Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa03122a1>] ? bnx2x_msix_fp_int+0xd1/0x170 [bnx2x] Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa0315c8f>] ? bnx2x_poll+0x10f/0x400 [bnx2x] Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81462a23>] ? net_rx_action+0x103/0x2f0 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8107d8b1>] ? __do_softirq+0xc1/0x1e0 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff810eaec0>] ? handle_IRQ_event+0x60/0x170 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8107d90f>] ? __do_softirq+0x11f/0x1e0 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8100fb55>] ? do_softirq+0x65/0xa0 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8107d765>] ? irq_exit+0x85/0x90 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81534405>] ? do_IRQ+0x75/0xf0 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8100b9d3>] ? ret_from_intr+0x0/0x11 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: <EOI> [<ffffffff81175f52>] ? kfree+0x122/0x320 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa0eb605d>] ? osd_object_free+0x11d/0x160 [osd_ldiskfs] Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa0573d43>] ? lu_object_free+0x113/0x1a0 [obdclass] Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa0574c07>] ? lu_site_purge+0x2e7/0x4f0 [obdclass] Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa0574f98>] ? lu_cache_shrink+0x188/0x310 [obdclass] Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8113d8da>] ? shrink_slab+0x11a/0x1a0 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81140c5a>] ? balance_pgdat+0x57a/0x800 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81141014>] ? kswapd+0x134/0x3b0 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8109eb00>] ? autoremove_wake_function+0x0/0x40 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81140ee0>] ? kswapd+0x0/0x3b0 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8109e66e>] ? kthread+0x9e/0xc0 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8100c20a>] ? child_rip+0xa/0x20 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8109e5d0>] ? kthread+0x0/0xc0 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: kswapd0: page allocation failure. order:2, mode:0x20 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: Pid: 340, comm: kswapd0 Not tainted 2.6.32-504.8.1.el6_lustre.x86_64 #1 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: Call Trace: Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: <IRQ> [<ffffffff811347ba>] ? __alloc_pages_nodemask+0x74a/0x8d0 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff811736e2>] ? kmem_getpages+0x62/0x170 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff811742fa>] ? fallback_alloc+0x1ba/0x270 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81173d4f>] ? cache_grow+0x2cf/0x320 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81174079>] ? ____cache_alloc_node+0x99/0x160 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x] Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81174cc9>] ? __kmalloc+0x199/0x230 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x] Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa0314277>] ? bnx2x_alloc_rx_data+0x47/0x1d0 [bnx2x] Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff812a2c88>] ? swiotlb_sync_single+0x28/0xd0 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa0314ea9>] ? bnx2x_rx_int+0xaa9/0x1670 [bnx2x] Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa03153c7>] ? bnx2x_rx_int+0xfc7/0x1670 [bnx2x] Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa03122a1>] ? bnx2x_msix_fp_int+0xd1/0x170 [bnx2x] Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa0315c8f>] ? bnx2x_poll+0x10f/0x400 [bnx2x] Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81462a23>] ? net_rx_action+0x103/0x2f0 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8107d8b1>] ? __do_softirq+0xc1/0x1e0 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff810eaec0>] ? handle_IRQ_event+0x60/0x170 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8107d90f>] ? __do_softirq+0x11f/0x1e0 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8100fb55>] ? do_softirq+0x65/0xa0 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8107d765>] ? irq_exit+0x85/0x90 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81534405>] ? do_IRQ+0x75/0xf0 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8100b9d3>] ? ret_from_intr+0x0/0x11 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: <EOI> [<ffffffff81175f52>] ? kfree+0x122/0x320 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa0eb605d>] ? osd_object_free+0x11d/0x160 [osd_ldiskfs] Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa0573d43>] ? lu_object_free+0x113/0x1a0 [obdclass] Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa0574c07>] ? lu_site_purge+0x2e7/0x4f0 [obdclass] Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa0574f98>] ? lu_cache_shrink+0x188/0x310 [obdclass] Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8113d8da>] ? shrink_slab+0x11a/0x1a0 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81140c5a>] ? balance_pgdat+0x57a/0x800 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81141014>] ? kswapd+0x134/0x3b0 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8109eb00>] ? autoremove_wake_function+0x0/0x40 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81140ee0>] ? kswapd+0x0/0x3b0 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8109e66e>] ? kthread+0x9e/0xc0 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8100c20a>] ? child_rip+0xa/0x20 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8109e5d0>] ? kthread+0x0/0xc0 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-504.8.1.el6_lustre.x86_64 #1 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: Call Trace: Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: <IRQ> [<ffffffff811347ba>] ? __alloc_pages_nodemask+0x74a/0x8d0 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff814b9133>] ? tcp_v4_do_rcv+0x2e3/0x490 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff811736e2>] ? kmem_getpages+0x62/0x170 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff811742fa>] ? fallback_alloc+0x1ba/0x270 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81173d4f>] ? cache_grow+0x2cf/0x320 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81174079>] ? ____cache_alloc_node+0x99/0x160 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x] Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81174cc9>] ? __kmalloc+0x199/0x230 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x] Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa0314277>] ? bnx2x_alloc_rx_data+0x47/0x1d0 [bnx2x] Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff812a2c88>] ? swiotlb_sync_single+0x28/0xd0 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa0314ea9>] ? bnx2x_rx_int+0xaa9/0x1670 [bnx2x] Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81450a6a>] ? skb_release_head_state+0x6a/0x110 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa031250c>] ? bnx2x_free_tx_pkt+0x1cc/0x2e0 [bnx2x] Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa03102f2>] ? bnx2x_napi_disable_cnic+0x102/0x120 [bnx2x] Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8145a2c3>] ? __napi_complete+0x23/0x40 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa0315c8f>] ? bnx2x_poll+0x10f/0x400 [bnx2x] Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81462a23>] ? net_rx_action+0x103/0x2f0 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8107d8b1>] ? __do_softirq+0xc1/0x1e0 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff810eaec0>] ? handle_IRQ_event+0x60/0x170 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8100fb55>] ? do_softirq+0x65/0xa0 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8107d765>] ? irq_exit+0x85/0x90 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81534405>] ? do_IRQ+0x75/0xf0 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8100b9d3>] ? ret_from_intr+0x0/0x11 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: <EOI> [<ffffffff812eaf5e>] ? intel_idle+0xde/0x170 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff812eaf41>] ? intel_idle+0xc1/0x170 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81426517>] ? cpuidle_idle_call+0xa7/0x140 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110 Jun 9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff815236e7>] ? start_secondary+0x2be/0x301 Jun 9 14:20:37 cs04r-sc-mds03-02 kernel: LustreError: 8091:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2 Jun 9 14:20:37 cs04r-sc-mds03-02 kernel: LustreError: 8091:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53140 of catalog 0x8:10 rc=-2 Jun 9 14:25:39 cs04r-sc-mds03-02 kernel: LustreError: 8075:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2 Jun 9 14:25:39 cs04r-sc-mds03-02 kernel: LustreError: 8075:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53144 of catalog 0x8:10 rc=-2 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: swapper: page allocation failure. order:2, mode:0x20 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: swapper: page allocation failure. order:2, mode:0x20 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-504.8.1.el6_lustre.x86_64 #1 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: Call Trace: Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: <IRQ> [<ffffffff811347ba>] ? __alloc_pages_nodemask+0x74a/0x8d0 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff811736e2>] ? kmem_getpages+0x62/0x170 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff811742fa>] ? fallback_alloc+0x1ba/0x270 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81173d4f>] ? cache_grow+0x2cf/0x320 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81174079>] ? ____cache_alloc_node+0x99/0x160 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81174cc9>] ? __kmalloc+0x199/0x230 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa0314277>] ? bnx2x_alloc_rx_data+0x47/0x1d0 [bnx2x] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff812a2c88>] ? swiotlb_sync_single+0x28/0xd0 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa0314ea9>] ? bnx2x_rx_int+0xaa9/0x1670 [bnx2x] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81450897>] ? __kfree_skb+0x47/0xa0 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa031250c>] ? bnx2x_free_tx_pkt+0x1cc/0x2e0 [bnx2x] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa03102fa>] ? bnx2x_napi_disable_cnic+0x10a/0x120 [bnx2x] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8145a2c3>] ? __napi_complete+0x23/0x40 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa0315c8f>] ? bnx2x_poll+0x10f/0x400 [bnx2x] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81462a23>] ? net_rx_action+0x103/0x2f0 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8107d8b1>] ? __do_softirq+0xc1/0x1e0 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff810eaec0>] ? handle_IRQ_event+0x60/0x170 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8100fb55>] ? do_softirq+0x65/0xa0 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8107d765>] ? irq_exit+0x85/0x90 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81534405>] ? do_IRQ+0x75/0xf0 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8100b9d3>] ? ret_from_intr+0x0/0x11 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: <EOI> [<ffffffff812eaf5e>] ? intel_idle+0xde/0x170 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff812eaf41>] ? intel_idle+0xc1/0x170 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81426517>] ? cpuidle_idle_call+0xa7/0x140 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff815236e7>] ? start_secondary+0x2be/0x301 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: swapper: page allocation failure. order:2, mode:0x20 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-504.8.1.el6_lustre.x86_64 #1 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: Call Trace: Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: <IRQ> [<ffffffff811347ba>] ? __alloc_pages_nodemask+0x74a/0x8d0 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff811736e2>] ? kmem_getpages+0x62/0x170 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff811742fa>] ? fallback_alloc+0x1ba/0x270 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81173d4f>] ? cache_grow+0x2cf/0x320 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81174079>] ? ____cache_alloc_node+0x99/0x160 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81174cc9>] ? __kmalloc+0x199/0x230 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa0314277>] ? bnx2x_alloc_rx_data+0x47/0x1d0 [bnx2x] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff812a2c88>] ? swiotlb_sync_single+0x28/0xd0 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa0314ea9>] ? bnx2x_rx_int+0xaa9/0x1670 [bnx2x] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa03153c7>] ? bnx2x_rx_int+0xfc7/0x1670 [bnx2x] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8101fa8a>] ? amd_pmu_cpu_prepare+0x7a/0x100 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81039787>] ? native_apic_msr_write+0x37/0x40 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81030042>] ? generic_set_all+0xb2/0x340 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8145a2c3>] ? __napi_complete+0x23/0x40 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa0315c8f>] ? bnx2x_poll+0x10f/0x400 [bnx2x] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81462a23>] ? net_rx_action+0x103/0x2f0 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8107d8b1>] ? __do_softirq+0xc1/0x1e0 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff810eaec0>] ? handle_IRQ_event+0x60/0x170 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8100fb55>] ? do_softirq+0x65/0xa0 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8107d765>] ? irq_exit+0x85/0x90 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81534405>] ? do_IRQ+0x75/0xf0 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8100b9d3>] ? ret_from_intr+0x0/0x11 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: <EOI> [<ffffffff812eaf5e>] ? intel_idle+0xde/0x170 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff812eaf41>] ? intel_idle+0xc1/0x170 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81426517>] ? cpuidle_idle_call+0xa7/0x140 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff815236e7>] ? start_secondary+0x2be/0x301 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: mdt00_028: page allocation failure. order:2, mode:0x20 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: Pid: 8137, comm: mdt00_028 Not tainted 2.6.32-504.8.1.el6_lustre.x86_64 #1 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: Call Trace: Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: <IRQ> [<ffffffff811347ba>] ? __alloc_pages_nodemask+0x74a/0x8d0 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff811736e2>] ? kmem_getpages+0x62/0x170 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff811742fa>] ? fallback_alloc+0x1ba/0x270 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81173d4f>] ? cache_grow+0x2cf/0x320 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81174079>] ? ____cache_alloc_node+0x99/0x160 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81174cc9>] ? __kmalloc+0x199/0x230 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa0314277>] ? bnx2x_alloc_rx_data+0x47/0x1d0 [bnx2x] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff812a2c88>] ? swiotlb_sync_single+0x28/0xd0 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa0314ea9>] ? bnx2x_rx_int+0xaa9/0x1670 [bnx2x] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81064ba2>] ? default_wake_function+0x12/0x20 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff810577e9>] ? __wake_up_common+0x59/0x90 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa0315c8f>] ? bnx2x_poll+0x10f/0x400 [bnx2x] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81462a23>] ? net_rx_action+0x103/0x2f0 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8107d8b1>] ? __do_softirq+0xc1/0x1e0 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff810eaec0>] ? handle_IRQ_event+0x60/0x170 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8107d90f>] ? __do_softirq+0x11f/0x1e0 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81133ea0>] ? drain_local_pages+0x0/0x20 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8100fb55>] ? do_softirq+0x65/0xa0 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8107d765>] ? irq_exit+0x85/0x90 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81534405>] ? do_IRQ+0x75/0xf0 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8100b9d3>] ? ret_from_intr+0x0/0x11 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: <EOI> [<ffffffff810b743e>] ? smp_call_function_many+0x1ee/0x260 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81133ea0>] ? drain_local_pages+0x0/0x20 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff810b74d2>] ? smp_call_function+0x22/0x30 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8107d594>] ? on_each_cpu+0x24/0x50 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81131d8c>] ? drain_all_pages+0x1c/0x20 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8113465d>] ? __alloc_pages_nodemask+0x5ed/0x8d0 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff811736e2>] ? kmem_getpages+0x62/0x170 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff811742fa>] ? fallback_alloc+0x1ba/0x270 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81173d4f>] ? cache_grow+0x2cf/0x320 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81174079>] ? ____cache_alloc_node+0x99/0x160 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa07c5bd5>] ? null_alloc_rs+0xc5/0x390 [ptlrpc] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81174cc9>] ? __kmalloc+0x199/0x230 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa07c5bd5>] ? null_alloc_rs+0xc5/0x390 [ptlrpc] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa07b4684>] ? sptlrpc_svc_alloc_rs+0x74/0x360 [ptlrpc] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa078afad>] ? lustre_pack_reply_v2+0x9d/0x280 [ptlrpc] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa078b236>] ? lustre_pack_reply_flags+0xa6/0x1e0 [ptlrpc] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa078b381>] ? lustre_pack_reply+0x11/0x20 [ptlrpc] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa07b1ec3>] ? req_capsule_server_pack+0x53/0x100 [ptlrpc] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa1125bf5>] ? mdt_getxattr+0x635/0x1470 [mdt] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa11042c5>] ? mdt_object_lock_internal+0x65/0x360 [mdt] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa110472c>] ? mdt_intent_getxattr+0x9c/0x150 [mdt] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa05755b6>] ? lu_object_find+0x16/0x20 [obdclass] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa10fbcf4>] ? mdt_intent_policy+0x494/0xce0 [mdt] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa073f4f9>] ? ldlm_lock_enqueue+0x129/0x9d0 [ptlrpc] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa076b46b>] ? ldlm_handle_enqueue0+0x51b/0x13f0 [ptlrpc] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa0422c8a>] ? lc_watchdog_touch+0x7a/0x190 [libcfs] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa07eb921>] ? tgt_enqueue+0x61/0x230 [ptlrpc] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa07ec56e>] ? tgt_request_handle+0x8be/0x1000 [ptlrpc] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa079c5a1>] ? ptlrpc_main+0xe41/0x1960 [ptlrpc] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa079b760>] ? ptlrpc_main+0x0/0x1960 [ptlrpc] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8109e66e>] ? kthread+0x9e/0xc0 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8100c20a>] ? child_rip+0xa/0x20 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8109e5d0>] ? kthread+0x0/0xc0 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: swapper: page allocation failure. order:2, mode:0x20 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-504.8.1.el6_lustre.x86_64 #1 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: Call Trace: Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: <IRQ> [<ffffffff811347ba>] ? __alloc_pages_nodemask+0x74a/0x8d0 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff811736e2>] ? kmem_getpages+0x62/0x170 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff811742fa>] ? fallback_alloc+0x1ba/0x270 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81173d4f>] ? cache_grow+0x2cf/0x320 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81174079>] ? ____cache_alloc_node+0x99/0x160 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81174cc9>] ? __kmalloc+0x199/0x230 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa0314277>] ? bnx2x_alloc_rx_data+0x47/0x1d0 [bnx2x] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff812a2c88>] ? swiotlb_sync_single+0x28/0xd0 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa0314ea9>] ? bnx2x_rx_int+0xaa9/0x1670 [bnx2x] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81450897>] ? __kfree_skb+0x47/0xa0 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa031250c>] ? bnx2x_free_tx_pkt+0x1cc/0x2e0 [bnx2x] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa03101b2>] ? bnx2x_drain_tx_queues+0x102/0x140 [bnx2x] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8145a2c3>] ? __napi_complete+0x23/0x40 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa0315c8f>] ? bnx2x_poll+0x10f/0x400 [bnx2x] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81462a23>] ? net_rx_action+0x103/0x2f0 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8107d8b1>] ? __do_softirq+0xc1/0x1e0 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff810eaec0>] ? handle_IRQ_event+0x60/0x170 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8100fb55>] ? do_softirq+0x65/0xa0 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8107d765>] ? irq_exit+0x85/0x90 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81534405>] ? do_IRQ+0x75/0xf0 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8100b9d3>] ? ret_from_intr+0x0/0x11 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: <EOI> [<ffffffff812eaf5e>] ? intel_idle+0xde/0x170 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff812eaf41>] ? intel_idle+0xc1/0x170 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81426517>] ? cpuidle_idle_call+0xa7/0x140 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff815236e7>] ? start_secondary+0x2be/0x301 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: swapper: page allocation failure. order:2, mode:0x20 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-504.8.1.el6_lustre.x86_64 #1 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: Call Trace: Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: <IRQ> [<ffffffff811347ba>] ? __alloc_pages_nodemask+0x74a/0x8d0 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff814b9133>] ? tcp_v4_do_rcv+0x2e3/0x490 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff811736e2>] ? kmem_getpages+0x62/0x170 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff811742fa>] ? fallback_alloc+0x1ba/0x270 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81173d4f>] ? cache_grow+0x2cf/0x320 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81174079>] ? ____cache_alloc_node+0x99/0x160 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81174cc9>] ? __kmalloc+0x199/0x230 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa0314277>] ? bnx2x_alloc_rx_data+0x47/0x1d0 [bnx2x] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff812a2c88>] ? swiotlb_sync_single+0x28/0xd0 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa0314ea9>] ? bnx2x_rx_int+0xaa9/0x1670 [bnx2x] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa0315c8f>] ? bnx2x_poll+0x10f/0x400 [bnx2x] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81462a23>] ? net_rx_action+0x103/0x2f0 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8107d8b1>] ? __do_softirq+0xc1/0x1e0 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff810eaec0>] ? handle_IRQ_event+0x60/0x170 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8100fb55>] ? do_softirq+0x65/0xa0 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8107d765>] ? irq_exit+0x85/0x90 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81534405>] ? do_IRQ+0x75/0xf0 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8100b9d3>] ? ret_from_intr+0x0/0x11 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: <EOI> [<ffffffff812eaf5e>] ? intel_idle+0xde/0x170 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff812eaf41>] ? intel_idle+0xc1/0x170 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81426517>] ? cpuidle_idle_call+0xa7/0x140 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff815236e7>] ? start_secondary+0x2be/0x301 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-504.8.1.el6_lustre.x86_64 #1 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: Call Trace: Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: <IRQ> [<ffffffff811347ba>] ? __alloc_pages_nodemask+0x74a/0x8d0 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff811736e2>] ? kmem_getpages+0x62/0x170 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff811742fa>] ? fallback_alloc+0x1ba/0x270 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81173d4f>] ? cache_grow+0x2cf/0x320 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81174079>] ? ____cache_alloc_node+0x99/0x160 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81174cc9>] ? __kmalloc+0x199/0x230 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa0314277>] ? bnx2x_alloc_rx_data+0x47/0x1d0 [bnx2x] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff812a2c88>] ? swiotlb_sync_single+0x28/0xd0 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa0314ea9>] ? bnx2x_rx_int+0xaa9/0x1670 [bnx2x] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81450897>] ? __kfree_skb+0x47/0xa0 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa031250c>] ? bnx2x_free_tx_pkt+0x1cc/0x2e0 [bnx2x] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa0310262>] ? bnx2x_napi_disable_cnic+0x72/0x120 [bnx2x] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8145a2c3>] ? __napi_complete+0x23/0x40 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa0315c8f>] ? bnx2x_poll+0x10f/0x400 [bnx2x] Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81462a23>] ? net_rx_action+0x103/0x2f0 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8107d8b1>] ? __do_softirq+0xc1/0x1e0 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff810eaec0>] ? handle_IRQ_event+0x60/0x170 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8100fb55>] ? do_softirq+0x65/0xa0 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8107d765>] ? irq_exit+0x85/0x90 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81534405>] ? do_IRQ+0x75/0xf0 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8100b9d3>] ? ret_from_intr+0x0/0x11 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: <EOI> [<ffffffff812eaf5e>] ? intel_idle+0xde/0x170 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff812eaf41>] ? intel_idle+0xc1/0x170 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81426517>] ? cpuidle_idle_call+0xa7/0x140 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110 Jun 9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff815236e7>] ? start_secondary+0x2be/0x301 Jun 9 14:29:55 cs04r-sc-mds03-02 kernel: LustreError: 8125:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2 Jun 9 14:29:55 cs04r-sc-mds03-02 kernel: LustreError: 8125:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53147 of catalog 0x8:10 rc=-2 Jun 9 14:30:06 cs04r-sc-mds03-02 kernel: LustreError: 8130:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2 Jun 9 14:30:06 cs04r-sc-mds03-02 kernel: LustreError: 8130:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53147 of catalog 0x8:10 rc=-2 Jun 9 14:52:27 cs04r-sc-mds03-02 kernel: LustreError: 8089:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2 Jun 9 14:52:27 cs04r-sc-mds03-02 kernel: LustreError: 8089:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53177 of catalog 0x8:10 rc=-2 Jun 9 14:56:40 cs04r-sc-mds03-02 kernel: LustreError: 8121:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2 Jun 9 14:56:40 cs04r-sc-mds03-02 kernel: LustreError: 8121:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53181 of catalog 0x8:10 rc=-2 Jun 9 15:06:27 cs04r-sc-mds03-02 kernel: LustreError: 8081:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2 Jun 9 15:06:27 cs04r-sc-mds03-02 kernel: LustreError: 8081:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53191 of catalog 0x8:10 rc=-2 Jun 9 15:06:37 cs04r-sc-mds03-02 kernel: LustreError: 8137:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2 Jun 9 15:06:37 cs04r-sc-mds03-02 kernel: LustreError: 8137:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53192 of catalog 0x8:10 rc=-2 Jun 9 15:09:51 cs04r-sc-mds03-02 kernel: LustreError: 8148:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2 Jun 9 15:09:51 cs04r-sc-mds03-02 kernel: LustreError: 8148:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53196 of catalog 0x8:10 rc=-2 Jun 9 15:19:17 cs04r-sc-mds03-02 kernel: LustreError: 8133:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2 Jun 9 15:19:17 cs04r-sc-mds03-02 kernel: LustreError: 8133:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53204 of catalog 0x8:10 rc=-2 Jun 9 15:28:50 cs04r-sc-mds03-02 kernel: LustreError: 8133:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2 Jun 9 15:28:50 cs04r-sc-mds03-02 kernel: LustreError: 8133:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53212 of catalog 0x8:10 rc=-2 Jun 9 15:37:10 cs04r-sc-mds03-02 kernel: LustreError: 8089:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2 Jun 9 15:37:10 cs04r-sc-mds03-02 kernel: LustreError: 8089:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53219 of catalog 0x8:10 rc=-2 Jun 9 15:40:18 cs04r-sc-mds03-02 kernel: LustreError: 8107:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2 Jun 9 15:40:18 cs04r-sc-mds03-02 kernel: LustreError: 8107:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53223 of catalog 0x8:10 rc=-2 Jun 9 15:42:49 cs04r-sc-mds03-02 kernel: LustreError: 7142:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2 Jun 9 15:42:49 cs04r-sc-mds03-02 kernel: LustreError: 7142:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53232 of catalog 0x8:10 rc=-2 Jun 9 15:47:41 cs04r-sc-mds03-02 kernel: LustreError: 7408:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2 Jun 9 15:47:41 cs04r-sc-mds03-02 kernel: LustreError: 7408:0:(llog_cat.c:508:llog_cat_cancel_records()) Skipped 9 previous similar messages Jun 9 15:47:41 cs04r-sc-mds03-02 kernel: LustreError: 7408:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53263 of catalog 0x8:10 rc=-2 Jun 9 15:47:41 cs04r-sc-mds03-02 kernel: LustreError: 7408:0:(mdd_device.c:260:llog_changelog_cancel()) Skipped 9 previous similar messages Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: swapper: page allocation failure. order:2, mode:0x20 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: swapper: page allocation failure. order:2, mode:0x20 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-504.8.1.el6_lustre.x86_64 #1 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: Call Trace: Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: <IRQ> [<ffffffff811347ba>] ? __alloc_pages_nodemask+0x74a/0x8d0 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff814b9133>] ? tcp_v4_do_rcv+0x2e3/0x490 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff811736e2>] ? kmem_getpages+0x62/0x170 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff811742fa>] ? fallback_alloc+0x1ba/0x270 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff81173d4f>] ? cache_grow+0x2cf/0x320 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff81174079>] ? ____cache_alloc_node+0x99/0x160 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x] Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff81174cc9>] ? __kmalloc+0x199/0x230 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x] Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffffa0314277>] ? bnx2x_alloc_rx_data+0x47/0x1d0 [bnx2x] Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff812a2c88>] ? swiotlb_sync_single+0x28/0xd0 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffffa0314ea9>] ? bnx2x_rx_int+0xaa9/0x1670 [bnx2x] Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffffa031250c>] ? bnx2x_free_tx_pkt+0x1cc/0x2e0 [bnx2x] Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffffa031055a>] ? bnx2x_free_msix_irqs+0x8a/0x190 [bnx2x] Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff8145a2c3>] ? __napi_complete+0x23/0x40 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffffa0315c8f>] ? bnx2x_poll+0x10f/0x400 [bnx2x] Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff81462a23>] ? net_rx_action+0x103/0x2f0 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff8107d8b1>] ? __do_softirq+0xc1/0x1e0 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff810eaec0>] ? handle_IRQ_event+0x60/0x170 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff8100fb55>] ? do_softirq+0x65/0xa0 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff8107d765>] ? irq_exit+0x85/0x90 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff81534405>] ? do_IRQ+0x75/0xf0 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff8100b9d3>] ? ret_from_intr+0x0/0x11 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: <EOI> [<ffffffff812eaf5e>] ? intel_idle+0xde/0x170 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff812eaf41>] ? intel_idle+0xc1/0x170 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff81426517>] ? cpuidle_idle_call+0xa7/0x140 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff815236e7>] ? start_secondary+0x2be/0x301 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: swapper: page allocation failure. order:2, mode:0x20 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-504.8.1.el6_lustre.x86_64 #1 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: Call Trace: Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: <IRQ> [<ffffffff811347ba>] ? __alloc_pages_nodemask+0x74a/0x8d0 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff811736e2>] ? kmem_getpages+0x62/0x170 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff811742fa>] ? fallback_alloc+0x1ba/0x270 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff81173d4f>] ? cache_grow+0x2cf/0x320 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff81174079>] ? ____cache_alloc_node+0x99/0x160 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x] Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff81174cc9>] ? __kmalloc+0x199/0x230 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x] Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffffa0314277>] ? bnx2x_alloc_rx_data+0x47/0x1d0 [bnx2x] Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff812a2c88>] ? swiotlb_sync_single+0x28/0xd0 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffffa0314ea9>] ? bnx2x_rx_int+0xaa9/0x1670 [bnx2x] Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffffa031250c>] ? bnx2x_free_tx_pkt+0x1cc/0x2e0 [bnx2x] Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffffa03102fa>] ? bnx2x_napi_disable_cnic+0x10a/0x120 [bnx2x] Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff8145a2c3>] ? __napi_complete+0x23/0x40 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffffa0315c8f>] ? bnx2x_poll+0x10f/0x400 [bnx2x] Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff810b0760>] ? tick_sched_timer+0x0/0xc0 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff81462a23>] ? net_rx_action+0x103/0x2f0 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff8107d8b1>] ? __do_softirq+0xc1/0x1e0 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff810eaec0>] ? handle_IRQ_event+0x60/0x170 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff8100fb55>] ? do_softirq+0x65/0xa0 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff8107d765>] ? irq_exit+0x85/0x90 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff81534405>] ? do_IRQ+0x75/0xf0 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff8100b9d3>] ? ret_from_intr+0x0/0x11 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: <EOI> [<ffffffff812eaf5e>] ? intel_idle+0xde/0x170 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff812eaf41>] ? intel_idle+0xc1/0x170 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff81426517>] ? cpuidle_idle_call+0xa7/0x140 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff815236e7>] ? start_secondary+0x2be/0x301 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-504.8.1.el6_lustre.x86_64 #1 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: Call Trace: Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: <IRQ> [<ffffffff811347ba>] ? __alloc_pages_nodemask+0x74a/0x8d0 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff811736e2>] ? kmem_getpages+0x62/0x170 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff811742fa>] ? fallback_alloc+0x1ba/0x270 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff81173d4f>] ? cache_grow+0x2cf/0x320 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff81174079>] ? ____cache_alloc_node+0x99/0x160 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x] Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff81174cc9>] ? __kmalloc+0x199/0x230 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x] Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffffa0314277>] ? bnx2x_alloc_rx_data+0x47/0x1d0 [bnx2x] Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff812a2c88>] ? swiotlb_sync_single+0x28/0xd0 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffffa0314ea9>] ? bnx2x_rx_int+0xaa9/0x1670 [bnx2x] Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff8145a2c3>] ? __napi_complete+0x23/0x40 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffffa0315c8f>] ? bnx2x_poll+0x10f/0x400 [bnx2x] Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff81462a23>] ? net_rx_action+0x103/0x2f0 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff8107d8b1>] ? __do_softirq+0xc1/0x1e0 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff810eaec0>] ? handle_IRQ_event+0x60/0x170 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff8100fb55>] ? do_softirq+0x65/0xa0 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff8107d765>] ? irq_exit+0x85/0x90 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff81534405>] ? do_IRQ+0x75/0xf0 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff8100b9d3>] ? ret_from_intr+0x0/0x11 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: <EOI> [<ffffffff812eaf5e>] ? intel_idle+0xde/0x170 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff812eaf41>] ? intel_idle+0xc1/0x170 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff81426517>] ? cpuidle_idle_call+0xa7/0x140 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110 Jun 9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff815236e7>] ? start_secondary+0x2be/0x301 Jun 9 16:01:46 cs04r-sc-mds03-02 kernel: LustreError: 8137:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2 Jun 9 16:01:46 cs04r-sc-mds03-02 kernel: LustreError: 8137:0:(llog_cat.c:508:llog_cat_cancel_records()) Skipped 14 previous similar messages Jun 9 16:01:46 cs04r-sc-mds03-02 kernel: LustreError: 8137:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53286 of catalog 0x8:10 rc=-2 Jun 9 16:01:46 cs04r-sc-mds03-02 kernel: LustreError: 8137:0:(mdd_device.c:260:llog_changelog_cancel()) Skipped 14 previous similar messages Jun 9 16:18:57 cs04r-sc-mds03-02 kernel: LustreError: 8081:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2 Jun 9 16:18:57 cs04r-sc-mds03-02 kernel: LustreError: 8081:0:(llog_cat.c:508:llog_cat_cancel_records()) Skipped 3 previous similar messages Jun 9 16:18:57 cs04r-sc-mds03-02 kernel: LustreError: 8081:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53302 of catalog 0x8:10 rc=-2 Jun 9 16:18:57 cs04r-sc-mds03-02 kernel: LustreError: 8081:0:(mdd_device.c:260:llog_changelog_cancel()) Skipped 3 previous similar messages Jun 9 16:40:16 cs04r-sc-mds03-02 kernel: LustreError: 8133:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2 Jun 9 16:40:16 cs04r-sc-mds03-02 kernel: LustreError: 8133:0:(llog_cat.c:508:llog_cat_cancel_records()) Skipped 2 previous similar messages Jun 9 16:40:16 cs04r-sc-mds03-02 kernel: LustreError: 8133:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53323 of catalog 0x8:10 rc=-2 Jun 9 16:40:16 cs04r-sc-mds03-02 kernel: LustreError: 8133:0:(mdd_device.c:260:llog_changelog_cancel()) Skipped 2 previous similar messages Jun 9 16:41:41 cs04r-sc-mds03-02 kernel: LustreError: 7408:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2 Jun 9 16:41:41 cs04r-sc-mds03-02 kernel: LustreError: 7408:0:(llog_cat.c:508:llog_cat_cancel_records()) Skipped 2 previous similar messages Jun 9 16:41:41 cs04r-sc-mds03-02 kernel: LustreError: 7408:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53326 of catalog 0x8:10 rc=-2 Jun 9 16:41:41 cs04r-sc-mds03-02 kernel: LustreError: 7408:0:(mdd_device.c:260:llog_changelog_cancel()) Skipped 2 previous similar messages Jun 9 16:45:02 cs04r-sc-mds03-02 kernel: LustreError: 8118:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2 Jun 9 16:45:02 cs04r-sc-mds03-02 kernel: LustreError: 8118:0:(llog_cat.c:508:llog_cat_cancel_records()) Skipped 1 previous similar message Jun 9 16:45:02 cs04r-sc-mds03-02 kernel: LustreError: 8118:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53331 of catalog 0x8:10 rc=-2 Jun 9 16:45:02 cs04r-sc-mds03-02 kernel: LustreError: 8118:0:(mdd_device.c:260:llog_changelog_cancel()) Skipped 1 previous similar message Jun 9 16:50:09 cs04r-sc-mds03-02 kernel: LustreError: 7548:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2 Jun 9 16:50:09 cs04r-sc-mds03-02 kernel: LustreError: 7548:0:(llog_cat.c:508:llog_cat_cancel_records()) Skipped 3 previous similar messages Jun 9 16:50:09 cs04r-sc-mds03-02 kernel: LustreError: 7548:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53340 of catalog 0x8:10 rc=-2 Jun 9 16:50:09 cs04r-sc-mds03-02 kernel: LustreError: 7548:0:(mdd_device.c:260:llog_changelog_cancel()) Skipped 3 previous similar messages The actual failure: Jun 9 16:55:05 cs04r-sc-mds03-02 kernel: LustreError: 8140:0:(llog_cat.c:163:llog_cat_id2handle()) lustre03-MDD0000: error opening log id 0x11564:1:0: rc = -2 Jun 9 16:55:05 cs04r-sc-mds03-02 kernel: LustreError: 8140:0:(llog_cat.c:537:llog_cat_process_cb()) lustre03-MDD0000: cannot find handle for llog 0x11564:1: -2 Jun 9 16:56:29 cs04r-sc-mds03-02 kernel: LustreError: 7605:0:(osd_handler.c:2530:osd_object_destroy()) ASSERTION(!lu_object_is_dying(dt->do_lu.lo_header) ) failed: Jun 9 16:56:29 cs04r-sc-mds03-02 kernel: LustreError: 7605:0:(osd_handler.c:2530:osd_object_destroy()) LBUG Jun 9 16:56:29 cs04r-sc-mds03-02 kernel: Pid: 7605, comm: mdt02_006 Jun 9 16:56:29 cs04r-sc-mds03-02 kernel: Jun 9 16:56:29 cs04r-sc-mds03-02 kernel: Call Trace: Jun 9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa0410895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] Jun 9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa0410e97>] lbug_with_loc+0x47/0xb0 [libcfs] Jun 9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa0ebc011>] osd_object_destroy+0x3b1/0x460 [osd_ldiskfs] Jun 9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa0536cf4>] llog_osd_destroy+0x5d4/0xd40 [obdclass] Jun 9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa052d3f1>] llog_destroy+0x51/0x170 [obdclass] Jun 9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa052f0c8>] llog_cat_process_cb+0x3a8/0x5f0 [obdclass] Jun 9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa052a5a9>] llog_process_thread+0xaa9/0xe80 [obdclass] Jun 9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa052ed20>] ? llog_cat_process_cb+0x0/0x5f0 [obdclass] Jun 9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa052aabf>] llog_process_or_fork+0x13f/0x540 [obdclass] Jun 9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa052dc5d>] llog_cat_process_or_fork+0x1ad/0x300 [obdclass] Jun 9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa1242f50>] ? llog_changelog_cancel_cb+0x0/0x1d0 [mdd] Jun 9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa052ddc9>] llog_cat_process+0x19/0x20 [obdclass] Jun 9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa1242d7f>] llog_changelog_cancel+0x5f/0x230 [mdd] Jun 9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa04211c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs] Jun 9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa0530df8>] llog_cancel+0x58/0x240 [obdclass] Jun 9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa12491aa>] mdd_changelog_user_purge+0x46a/0x6f0 [mdd] Jun 9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa1249a8c>] mdd_iocontrol+0x65c/0xb70 [mdd] Jun 9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa10f8119>] mdt_ioc_child+0x149/0x1d0 [mdt] Jun 9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa1104c4b>] mdt_iocontrol+0x2fb/0x8e0 [mdt] Jun 9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa11054b1>] mdt_set_info+0x281/0x430 [mdt] Jun 9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa078b381>] ? lustre_pack_reply+0x11/0x20 [ptlrpc] Jun 9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa07ec56e>] tgt_request_handle+0x8be/0x1000 [ptlrpc] Jun 9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa079c5a1>] ptlrpc_main+0xe41/0x1960 [ptlrpc] Jun 9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffff8106c4f0>] ? pick_next_task_fair+0xd0/0x130 Jun 9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa079b760>] ? ptlrpc_main+0x0/0x1960 [ptlrpc] Jun 9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffff8109e66e>] kthread+0x9e/0xc0 Jun 9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20 Jun 9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffff8109e5d0>] ? kthread+0x0/0xc0 Jun 9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20 Jun 9 16:56:29 cs04r-sc-mds03-02 kernel: |
| Comment by Andreas Dilger [ 10/Jun/15 ] |
|
What version of Lustre did you upgrade from? |
| Comment by Andreas Dilger [ 10/Jun/15 ] |
|
Also, is this hitting repeatedly, or did it go away when the system was restarted? Mike, it looks like this is related to llog handling during ChangeLog processing. Is it possible there is a race with multiple threads cancelling the same records? In any case, there shouldn't be an LASSERT() when deleting a log file if the file is already being deleted? |
| Comment by Dave Bond (Inactive) [ 11/Jun/15 ] |
|
This system was upgraded from 2.5, I have only seen this once. Though we did have an issue with an MDS not responding after a minor network outage. But I do not have a compelling set of logs to suggest they are related. I will put the console output below Jun 8 13:27:57 cs04r-sc-mds03-01 kernel: LNet: There was an unexpected network error while writing to 172.23.148.22: -110. Jun 8 13:30:32 cs04r-sc-mds03-01 kernel: Lustre: MGS: haven't heard from client de0451fe-3e87-4bcb-2ca6-d2af988671be (at 172.23.148.35@tcp) in 227 seconds. I think it's dead, and I am evicting it. exp ffff881fcf0bcc00, cur 1433766632 expire 1433766482 last 1433766405 Jun 8 13:30:32 cs04r-sc-mds03-01 kernel: Lustre: Skipped 1 previous similar message Jun 8 13:30:48 cs04r-sc-mds03-01 kernel: Lustre: lustre03-MDT0000: Client bb255a22-f3c1-835b-8049-eab34c95ba65 (at 172.23.148.64@tcp) reconnecting Jun 8 13:30:59 cs04r-sc-mds03-01 kernel: Lustre: lustre03-MDT0000: Client db5a1353-f37b-fe0a-ccf8-9bc50f7a62ad (at 172.23.148.65@tcp) reconnecting Jun 8 13:31:03 cs04r-sc-mds03-01 kernel: Lustre: MGS: Client b85575c0-8d63-0c39-a18e-c25179bf68dd (at 172.23.148.26@tcp) reconnecting Jun 8 13:31:08 cs04r-sc-mds03-01 kernel: Lustre: MGS: Client 0e2a3416-2996-0da3-aab5-16ab1d68433f (at 172.23.148.24@tcp) reconnecting Jun 8 13:31:08 cs04r-sc-mds03-01 kernel: Lustre: Skipped 2 previous similar messages Jun 8 13:31:38 cs04r-sc-mds03-01 kernel: Lustre: MGS: Client 8945eb8e-242f-a306-9ce7-98c47b58cd6c (at 172.23.148.38@tcp) reconnecting Jun 8 13:31:38 cs04r-sc-mds03-01 kernel: Lustre: Skipped 1 previous similar message Jun 8 13:38:59 cs04r-sc-mds03-01 kernel: LustreError: 20218:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2 Jun 8 13:38:59 cs04r-sc-mds03-01 kernel: LustreError: 20218:0:(llog_cat.c:508:llog_cat_cancel_records()) Skipped 18 previous similar messages Jun 8 13:38:59 cs04r-sc-mds03-01 kernel: LustreError: 20218:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 52222 of catalog 0x8:10 rc=-2 Jun 8 13:38:59 cs04r-sc-mds03-01 kernel: LustreError: 20218:0:(mdd_device.c:260:llog_changelog_cancel()) Skipped 18 previous similar messages Jun 8 13:49:04 cs04r-sc-mds03-01 kernel: LustreError: 18959:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2 Jun 8 13:49:04 cs04r-sc-mds03-01 kernel: LustreError: 18959:0:(llog_cat.c:508:llog_cat_cancel_records()) Skipped 17 previous similar messages Jun 8 13:49:04 cs04r-sc-mds03-01 kernel: LustreError: 18959:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 52247 of catalog 0x8:10 rc=-2 Jun 8 13:49:04 cs04r-sc-mds03-01 kernel: LustreError: 18959:0:(mdd_device.c:260:llog_changelog_cancel()) Skipped 17 previous similar messages Jun 8 14:02:30 cs04r-sc-mds03-01 kernel: LustreError: 18965:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2 Jun 8 14:02:30 cs04r-sc-mds03-01 kernel: LustreError: 18965:0:(llog_cat.c:508:llog_cat_cancel_records()) Skipped 25 previous similar messages Jun 8 14:02:30 cs04r-sc-mds03-01 kernel: LustreError: 18965:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 52272 of catalog 0x8:10 rc=-2 Jun 8 14:02:30 cs04r-sc-mds03-01 kernel: LustreError: 18965:0:(mdd_device.c:260:llog_changelog_cancel()) Skipped 25 previous similar messages Jun 8 14:04:33 cs04r-sc-mds03-01 kernel: LustreError: 19000:0:(llog_cat.c:163:llog_cat_id2handle()) lustre03-MDD0000: error opening log id 0x10f8e:1:0: rc = -2 Jun 8 14:04:33 cs04r-sc-mds03-01 kernel: LustreError: 19000:0:(llog_cat.c:537:llog_cat_process_cb()) lustre03-MDD0000: cannot find handle for llog 0x10f8e:1: -2 |
| Comment by nasf (Inactive) [ 09/Feb/16 ] |
|
[delete unrelated test failure] |
| Comment by Alex Zhuravlev [ 09/Feb/16 ] |
|
[delete unrelated test failure] |
| Comment by Alexander Zarochentsev [ 10/Aug/16 ] |
|
Mike, we had the following fix for Lustre-2.1 : MRP-1443 llog: avoid llog cancel race
Concurrently running two or more lfs changelog_clear need to be
protected against races. llog_process_thread() used to read llogs
without taking into account that the llog being read may be destroyed
by another process. This patch serializes changelog cancellings using
llog_ctxt's mutex.
diff --git a/lustre/mdd/mdd_device.c b/lustre/mdd/mdd_device.c
index 1642f0a..8140208 100644
--- a/lustre/mdd/mdd_device.c
+++ b/lustre/mdd/mdd_device.c
@@ -386,6 +386,7 @@ int mdd_changelog_llog_cancel(const struct lu_env *env,
if (ctxt == NULL)
return -ENXIO;
+ cfs_mutex_lock(&ctxt->loc_mutex);
cfs_spin_lock(&mdd->mdd_cl.mc_lock);
cur = (long long)mdd->mdd_cl.mc_index;
cfs_spin_unlock(&mdd->mdd_cl.mc_lock);
@@ -413,6 +414,7 @@ int mdd_changelog_llog_cancel(const struct lu_env *env,
rc = llog_cancel(ctxt, NULL, 1, (struct llog_cookie *)&endrec, 0);
out:
+ cfs_mutex_unlock(&ctxt->loc_mutex);
llog_ctxt_put(ctxt);
return rc;
}
do you think it might be useful for 2.7+ ? |
| Comment by Rahul Deshmukh (Inactive) [ 11/Aug/16 ] |
|
Created the bug https://jira.hpdd.intel.com/browse/LU-8496 (Race is changelog clear path). The assertion is different but seems that the path is same, so please review. |
| Comment by Mikhail Pershin [ 22/Jul/18 ] |
|
Outdated issue, there were several patches landed to fix llog races and issues and this issue may be fixed already. Reopen if will appear again |