[LU-6699] LustreError: 7605:0:(osd_handler.c:2530:osd_object_destroy()) ASSERTION Created: 09/Jun/15  Updated: 22/Jul/18  Resolved: 22/Jul/18

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0, Lustre 2.10.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Dave Bond (Inactive) Assignee: Mikhail Pershin
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

RHEL6


Issue Links:
Related
is related to LU-8496 Race is changelog clear path Closed
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

We have just upgraded our servers to 2.7. This has caused one of the MDS to assert.

Message from syslogd@cs04r-sc-mds03-02 at Jun 9 16:56:29 ...
kernel:LustreError: 7605:0:(osd_handler.c:2530:osd_object_destroy()) ASSERTION( !lu_object_is_dying(dt->do_lu.lo_header) ) failed:
Jun 9 16:56:29 cs04r-sc-mds03-02 kernel: LustreError: 7605:0:(osd_handler.c:2530:osd_object_destroy()) LBUG
Jun 9 16:56:29 cs04r-sc-mds03-02 kernel: Pid: 7605, comm: mdt02_006
Jun 9 16:56:29 cs04r-sc-mds03-02 kernel:
Jun 9 16:56:29 cs04r-sc-mds03-02 kernel: Call Trace:
Jun 9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa0410895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
Jun 9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa0410e97>] lbug_with_loc+0x47/0xb0 [libcfs]

Could you advise a suitable course of action



 Comments   
Comment by Andreas Dilger [ 09/Jun/15 ]

Could you please provide the rest of the stack trace below "lbug_with_loc".

Comment by Peter Jones [ 09/Jun/15 ]

Alex

Could you please advise?

Thanks

Peter

Comment by Dave Bond (Inactive) [ 10/Jun/15 ]

Preceding messages:

Jun  9 11:28:23 cs04r-sc-mds03-02 kernel: LustreError: 8077:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2
Jun  9 11:28:23 cs04r-sc-mds03-02 kernel: LustreError: 8077:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 52990 of catalog 0x8:10 rc=-2
Jun  9 11:34:22 cs04r-sc-mds03-02 kernel: swapper: page allocation failure. order:2, mode:0x20
Jun  9 11:34:22 cs04r-sc-mds03-02 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-504.8.1.el6_lustre.x86_64 #1
Jun  9 11:34:22 cs04r-sc-mds03-02 kernel: Call Trace:
Jun  9 11:34:22 cs04r-sc-mds03-02 kernel: <IRQ>  [<ffffffff811347ba>] ? __alloc_pages_nodemask+0x74a/0x8d0
Jun  9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffff811736e2>] ? kmem_getpages+0x62/0x170
Jun  9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffff811742fa>] ? fallback_alloc+0x1ba/0x270
Jun  9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffff81173d4f>] ? cache_grow+0x2cf/0x320
Jun  9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffff81174079>] ? ____cache_alloc_node+0x99/0x160
Jun  9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x]
Jun  9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffff81174cc9>] ? __kmalloc+0x199/0x230
Jun  9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x]
Jun  9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffffa0314277>] ? bnx2x_alloc_rx_data+0x47/0x1d0 [bnx2x]
Jun  9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffff812a2c88>] ? swiotlb_sync_single+0x28/0xd0
Jun  9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffffa0314ea9>] ? bnx2x_rx_int+0xaa9/0x1670 [bnx2x]
Jun  9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffff81450a6a>] ? skb_release_head_state+0x6a/0x110
Jun  9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffff8145086e>] ? __kfree_skb+0x1e/0xa0
Jun  9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffffa031250c>] ? bnx2x_free_tx_pkt+0x1cc/0x2e0 [bnx2x]
Jun  9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffffa0310182>] ? bnx2x_drain_tx_queues+0xd2/0x140 [bnx2x]
Jun  9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffff8145a2c3>] ? __napi_complete+0x23/0x40
Jun  9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffffa0315c8f>] ? bnx2x_poll+0x10f/0x400 [bnx2x]
Jun  9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffff81462a23>] ? net_rx_action+0x103/0x2f0
Jun  9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffff8107d8b1>] ? __do_softirq+0xc1/0x1e0
Jun  9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffff810eaec0>] ? handle_IRQ_event+0x60/0x170
Jun  9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30
Jun  9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffff8100fb55>] ? do_softirq+0x65/0xa0
Jun  9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffff8107d765>] ? irq_exit+0x85/0x90
Jun  9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffff81534405>] ? do_IRQ+0x75/0xf0
Jun  9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffff8100b9d3>] ? ret_from_intr+0x0/0x11
Jun  9 11:34:22 cs04r-sc-mds03-02 kernel: <EOI>  [<ffffffff812eaf5e>] ? intel_idle+0xde/0x170
Jun  9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffff812eaf41>] ? intel_idle+0xc1/0x170
Jun  9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffff81426517>] ? cpuidle_idle_call+0xa7/0x140
Jun  9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110
Jun  9 11:34:22 cs04r-sc-mds03-02 kernel: [<ffffffff815236e7>] ? start_secondary+0x2be/0x301
Jun  9 11:35:26 cs04r-sc-mds03-02 kernel: LustreError: 7140:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2
Jun  9 11:35:26 cs04r-sc-mds03-02 kernel: LustreError: 7140:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53002 of catalog 0x8:10 rc=-2
Jun  9 11:36:24 cs04r-sc-mds03-02 kernel: LustreError: 8124:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2
Jun  9 11:36:24 cs04r-sc-mds03-02 kernel: LustreError: 8124:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53004 of catalog 0x8:10 rc=-2
Jun  9 11:57:34 cs04r-sc-mds03-02 kernel: LustreError: 8149:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2
Jun  9 11:57:34 cs04r-sc-mds03-02 kernel: LustreError: 8149:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53028 of catalog 0x8:10 rc=-2
Jun  9 12:06:33 cs04r-sc-mds03-02 kernel: LustreError: 7605:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2
Jun  9 12:06:33 cs04r-sc-mds03-02 kernel: LustreError: 7605:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53035 of catalog 0x8:10 rc=-2
Jun  9 12:21:21 cs04r-sc-mds03-02 kernel: LustreError: 8118:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2
Jun  9 12:21:21 cs04r-sc-mds03-02 kernel: LustreError: 8118:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53047 of catalog 0x8:10 rc=-2
Jun  9 12:26:42 cs04r-sc-mds03-02 kernel: LustreError: 8086:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2
Jun  9 12:26:42 cs04r-sc-mds03-02 kernel: LustreError: 8086:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53051 of catalog 0x8:10 rc=-2
Jun  9 13:12:41 cs04r-sc-mds03-02 kernel: LustreError: 8124:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2
Jun  9 13:12:41 cs04r-sc-mds03-02 kernel: LustreError: 8124:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53090 of catalog 0x8:10 rc=-2
Jun  9 13:48:26 cs04r-sc-mds03-02 kernel: LustreError: 7408:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2
Jun  9 13:48:26 cs04r-sc-mds03-02 kernel: LustreError: 7408:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53117 of catalog 0x8:10 rc=-2
Jun  9 13:53:05 cs04r-sc-mds03-02 kernel: LustreError: 7433:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2
Jun  9 13:53:05 cs04r-sc-mds03-02 kernel: LustreError: 7433:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53120 of catalog 0x8:10 rc=-2
Jun  9 13:58:31 cs04r-sc-mds03-02 kernel: LustreError: 7176:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2
Jun  9 13:58:31 cs04r-sc-mds03-02 kernel: LustreError: 7176:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53124 of catalog 0x8:10 rc=-2
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: swapper: page allocation failure. order:2, mode:0x20
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: kswapd0: page allocation failure. order:2, mode:0x20
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: Pid: 340, comm: kswapd0 Not tainted 2.6.32-504.8.1.el6_lustre.x86_64 #1
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: Call Trace:
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: <IRQ>  [<ffffffff811347ba>] ? __alloc_pages_nodemask+0x74a/0x8d0
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff811736e2>] ? kmem_getpages+0x62/0x170
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff811742fa>] ? fallback_alloc+0x1ba/0x270
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81173d4f>] ? cache_grow+0x2cf/0x320
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81174079>] ? ____cache_alloc_node+0x99/0x160
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x]
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81174cc9>] ? __kmalloc+0x199/0x230
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x]
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa0314277>] ? bnx2x_alloc_rx_data+0x47/0x1d0 [bnx2x]
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff812a2c88>] ? swiotlb_sync_single+0x28/0xd0
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa0314ea9>] ? bnx2x_rx_int+0xaa9/0x1670 [bnx2x]
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa03153c7>] ? bnx2x_rx_int+0xfc7/0x1670 [bnx2x]
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa03122a1>] ? bnx2x_msix_fp_int+0xd1/0x170 [bnx2x]
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa0315c8f>] ? bnx2x_poll+0x10f/0x400 [bnx2x]
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81462a23>] ? net_rx_action+0x103/0x2f0
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8107d8b1>] ? __do_softirq+0xc1/0x1e0
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff810eaec0>] ? handle_IRQ_event+0x60/0x170
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8107d90f>] ? __do_softirq+0x11f/0x1e0
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8100fb55>] ? do_softirq+0x65/0xa0
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8107d765>] ? irq_exit+0x85/0x90
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81534405>] ? do_IRQ+0x75/0xf0
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8100b9d3>] ? ret_from_intr+0x0/0x11
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: <EOI>  [<ffffffff81175f52>] ? kfree+0x122/0x320
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa0eb605d>] ? osd_object_free+0x11d/0x160 [osd_ldiskfs]
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa0573d43>] ? lu_object_free+0x113/0x1a0 [obdclass]
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa0574c07>] ? lu_site_purge+0x2e7/0x4f0 [obdclass]
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa0574f98>] ? lu_cache_shrink+0x188/0x310 [obdclass]
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8113d8da>] ? shrink_slab+0x11a/0x1a0
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81140c5a>] ? balance_pgdat+0x57a/0x800
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81141014>] ? kswapd+0x134/0x3b0
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8109eb00>] ? autoremove_wake_function+0x0/0x40
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81140ee0>] ? kswapd+0x0/0x3b0
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8109e66e>] ? kthread+0x9e/0xc0
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8100c20a>] ? child_rip+0xa/0x20
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8109e5d0>] ? kthread+0x0/0xc0
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: kswapd0: page allocation failure. order:2, mode:0x20
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: Pid: 340, comm: kswapd0 Not tainted 2.6.32-504.8.1.el6_lustre.x86_64 #1
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: Call Trace:
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: <IRQ>  [<ffffffff811347ba>] ? __alloc_pages_nodemask+0x74a/0x8d0
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff811736e2>] ? kmem_getpages+0x62/0x170
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff811742fa>] ? fallback_alloc+0x1ba/0x270
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81173d4f>] ? cache_grow+0x2cf/0x320
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81174079>] ? ____cache_alloc_node+0x99/0x160
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x]
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81174cc9>] ? __kmalloc+0x199/0x230
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x]
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa0314277>] ? bnx2x_alloc_rx_data+0x47/0x1d0 [bnx2x]
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff812a2c88>] ? swiotlb_sync_single+0x28/0xd0
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa0314ea9>] ? bnx2x_rx_int+0xaa9/0x1670 [bnx2x]
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa03153c7>] ? bnx2x_rx_int+0xfc7/0x1670 [bnx2x]
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa03122a1>] ? bnx2x_msix_fp_int+0xd1/0x170 [bnx2x]
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa0315c8f>] ? bnx2x_poll+0x10f/0x400 [bnx2x]
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81462a23>] ? net_rx_action+0x103/0x2f0
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8107d8b1>] ? __do_softirq+0xc1/0x1e0
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff810eaec0>] ? handle_IRQ_event+0x60/0x170
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8107d90f>] ? __do_softirq+0x11f/0x1e0
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8100fb55>] ? do_softirq+0x65/0xa0
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8107d765>] ? irq_exit+0x85/0x90
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81534405>] ? do_IRQ+0x75/0xf0
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8100b9d3>] ? ret_from_intr+0x0/0x11
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: <EOI>  [<ffffffff81175f52>] ? kfree+0x122/0x320
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa0eb605d>] ? osd_object_free+0x11d/0x160 [osd_ldiskfs]
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa0573d43>] ? lu_object_free+0x113/0x1a0 [obdclass]
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa0574c07>] ? lu_site_purge+0x2e7/0x4f0 [obdclass]
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa0574f98>] ? lu_cache_shrink+0x188/0x310 [obdclass]
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8113d8da>] ? shrink_slab+0x11a/0x1a0
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81140c5a>] ? balance_pgdat+0x57a/0x800
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81141014>] ? kswapd+0x134/0x3b0
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8109eb00>] ? autoremove_wake_function+0x0/0x40
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81140ee0>] ? kswapd+0x0/0x3b0
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8109e66e>] ? kthread+0x9e/0xc0
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8100c20a>] ? child_rip+0xa/0x20
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8109e5d0>] ? kthread+0x0/0xc0
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-504.8.1.el6_lustre.x86_64 #1
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: Call Trace:
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: <IRQ>  [<ffffffff811347ba>] ? __alloc_pages_nodemask+0x74a/0x8d0
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff814b9133>] ? tcp_v4_do_rcv+0x2e3/0x490
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff811736e2>] ? kmem_getpages+0x62/0x170
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff811742fa>] ? fallback_alloc+0x1ba/0x270
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81173d4f>] ? cache_grow+0x2cf/0x320
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81174079>] ? ____cache_alloc_node+0x99/0x160
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x]
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81174cc9>] ? __kmalloc+0x199/0x230
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x]
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa0314277>] ? bnx2x_alloc_rx_data+0x47/0x1d0 [bnx2x]
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff812a2c88>] ? swiotlb_sync_single+0x28/0xd0
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa0314ea9>] ? bnx2x_rx_int+0xaa9/0x1670 [bnx2x]
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81450a6a>] ? skb_release_head_state+0x6a/0x110
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa031250c>] ? bnx2x_free_tx_pkt+0x1cc/0x2e0 [bnx2x]
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa03102f2>] ? bnx2x_napi_disable_cnic+0x102/0x120 [bnx2x]
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8145a2c3>] ? __napi_complete+0x23/0x40
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffffa0315c8f>] ? bnx2x_poll+0x10f/0x400 [bnx2x]
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81462a23>] ? net_rx_action+0x103/0x2f0
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8107d8b1>] ? __do_softirq+0xc1/0x1e0
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff810eaec0>] ? handle_IRQ_event+0x60/0x170
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8100fb55>] ? do_softirq+0x65/0xa0
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8107d765>] ? irq_exit+0x85/0x90
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81534405>] ? do_IRQ+0x75/0xf0
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff8100b9d3>] ? ret_from_intr+0x0/0x11
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: <EOI>  [<ffffffff812eaf5e>] ? intel_idle+0xde/0x170
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff812eaf41>] ? intel_idle+0xc1/0x170
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81426517>] ? cpuidle_idle_call+0xa7/0x140
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110
Jun  9 14:00:47 cs04r-sc-mds03-02 kernel: [<ffffffff815236e7>] ? start_secondary+0x2be/0x301
Jun  9 14:20:37 cs04r-sc-mds03-02 kernel: LustreError: 8091:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2
Jun  9 14:20:37 cs04r-sc-mds03-02 kernel: LustreError: 8091:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53140 of catalog 0x8:10 rc=-2
Jun  9 14:25:39 cs04r-sc-mds03-02 kernel: LustreError: 8075:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2
Jun  9 14:25:39 cs04r-sc-mds03-02 kernel: LustreError: 8075:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53144 of catalog 0x8:10 rc=-2
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: swapper: page allocation failure. order:2, mode:0x20
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: swapper: page allocation failure. order:2, mode:0x20
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-504.8.1.el6_lustre.x86_64 #1
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: Call Trace:
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: <IRQ>  [<ffffffff811347ba>] ? __alloc_pages_nodemask+0x74a/0x8d0
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff811736e2>] ? kmem_getpages+0x62/0x170
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff811742fa>] ? fallback_alloc+0x1ba/0x270
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81173d4f>] ? cache_grow+0x2cf/0x320
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81174079>] ? ____cache_alloc_node+0x99/0x160
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81174cc9>] ? __kmalloc+0x199/0x230
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa0314277>] ? bnx2x_alloc_rx_data+0x47/0x1d0 [bnx2x]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff812a2c88>] ? swiotlb_sync_single+0x28/0xd0
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa0314ea9>] ? bnx2x_rx_int+0xaa9/0x1670 [bnx2x]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81450897>] ? __kfree_skb+0x47/0xa0
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa031250c>] ? bnx2x_free_tx_pkt+0x1cc/0x2e0 [bnx2x]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa03102fa>] ? bnx2x_napi_disable_cnic+0x10a/0x120 [bnx2x]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8145a2c3>] ? __napi_complete+0x23/0x40
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa0315c8f>] ? bnx2x_poll+0x10f/0x400 [bnx2x]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81462a23>] ? net_rx_action+0x103/0x2f0
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8107d8b1>] ? __do_softirq+0xc1/0x1e0
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff810eaec0>] ? handle_IRQ_event+0x60/0x170
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8100fb55>] ? do_softirq+0x65/0xa0
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8107d765>] ? irq_exit+0x85/0x90
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81534405>] ? do_IRQ+0x75/0xf0
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8100b9d3>] ? ret_from_intr+0x0/0x11
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: <EOI>  [<ffffffff812eaf5e>] ? intel_idle+0xde/0x170
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff812eaf41>] ? intel_idle+0xc1/0x170
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81426517>] ? cpuidle_idle_call+0xa7/0x140
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff815236e7>] ? start_secondary+0x2be/0x301
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: swapper: page allocation failure. order:2, mode:0x20
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-504.8.1.el6_lustre.x86_64 #1
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: Call Trace:
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: <IRQ>  [<ffffffff811347ba>] ? __alloc_pages_nodemask+0x74a/0x8d0
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff811736e2>] ? kmem_getpages+0x62/0x170
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff811742fa>] ? fallback_alloc+0x1ba/0x270
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81173d4f>] ? cache_grow+0x2cf/0x320
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81174079>] ? ____cache_alloc_node+0x99/0x160
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81174cc9>] ? __kmalloc+0x199/0x230
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa0314277>] ? bnx2x_alloc_rx_data+0x47/0x1d0 [bnx2x]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff812a2c88>] ? swiotlb_sync_single+0x28/0xd0
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa0314ea9>] ? bnx2x_rx_int+0xaa9/0x1670 [bnx2x]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa03153c7>] ? bnx2x_rx_int+0xfc7/0x1670 [bnx2x]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8101fa8a>] ? amd_pmu_cpu_prepare+0x7a/0x100
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81039787>] ? native_apic_msr_write+0x37/0x40
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81030042>] ? generic_set_all+0xb2/0x340
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8145a2c3>] ? __napi_complete+0x23/0x40
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa0315c8f>] ? bnx2x_poll+0x10f/0x400 [bnx2x]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81462a23>] ? net_rx_action+0x103/0x2f0
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8107d8b1>] ? __do_softirq+0xc1/0x1e0
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff810eaec0>] ? handle_IRQ_event+0x60/0x170
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8100fb55>] ? do_softirq+0x65/0xa0
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8107d765>] ? irq_exit+0x85/0x90
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81534405>] ? do_IRQ+0x75/0xf0
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8100b9d3>] ? ret_from_intr+0x0/0x11
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: <EOI>  [<ffffffff812eaf5e>] ? intel_idle+0xde/0x170
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff812eaf41>] ? intel_idle+0xc1/0x170
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81426517>] ? cpuidle_idle_call+0xa7/0x140
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff815236e7>] ? start_secondary+0x2be/0x301
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: mdt00_028: page allocation failure. order:2, mode:0x20
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: Pid: 8137, comm: mdt00_028 Not tainted 2.6.32-504.8.1.el6_lustre.x86_64 #1
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: Call Trace:
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: <IRQ>  [<ffffffff811347ba>] ? __alloc_pages_nodemask+0x74a/0x8d0
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff811736e2>] ? kmem_getpages+0x62/0x170
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff811742fa>] ? fallback_alloc+0x1ba/0x270
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81173d4f>] ? cache_grow+0x2cf/0x320
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81174079>] ? ____cache_alloc_node+0x99/0x160
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81174cc9>] ? __kmalloc+0x199/0x230
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa0314277>] ? bnx2x_alloc_rx_data+0x47/0x1d0 [bnx2x]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff812a2c88>] ? swiotlb_sync_single+0x28/0xd0
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa0314ea9>] ? bnx2x_rx_int+0xaa9/0x1670 [bnx2x]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81064ba2>] ? default_wake_function+0x12/0x20
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff810577e9>] ? __wake_up_common+0x59/0x90
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa0315c8f>] ? bnx2x_poll+0x10f/0x400 [bnx2x]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81462a23>] ? net_rx_action+0x103/0x2f0
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8107d8b1>] ? __do_softirq+0xc1/0x1e0
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff810eaec0>] ? handle_IRQ_event+0x60/0x170
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8107d90f>] ? __do_softirq+0x11f/0x1e0
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81133ea0>] ? drain_local_pages+0x0/0x20
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8100fb55>] ? do_softirq+0x65/0xa0
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8107d765>] ? irq_exit+0x85/0x90
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81534405>] ? do_IRQ+0x75/0xf0
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8100b9d3>] ? ret_from_intr+0x0/0x11
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: <EOI>  [<ffffffff810b743e>] ? smp_call_function_many+0x1ee/0x260
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81133ea0>] ? drain_local_pages+0x0/0x20
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff810b74d2>] ? smp_call_function+0x22/0x30
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8107d594>] ? on_each_cpu+0x24/0x50
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81131d8c>] ? drain_all_pages+0x1c/0x20
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8113465d>] ? __alloc_pages_nodemask+0x5ed/0x8d0
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff811736e2>] ? kmem_getpages+0x62/0x170
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff811742fa>] ? fallback_alloc+0x1ba/0x270
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81173d4f>] ? cache_grow+0x2cf/0x320
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81174079>] ? ____cache_alloc_node+0x99/0x160
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa07c5bd5>] ? null_alloc_rs+0xc5/0x390 [ptlrpc]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81174cc9>] ? __kmalloc+0x199/0x230
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa07c5bd5>] ? null_alloc_rs+0xc5/0x390 [ptlrpc]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa07b4684>] ? sptlrpc_svc_alloc_rs+0x74/0x360 [ptlrpc]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa078afad>] ? lustre_pack_reply_v2+0x9d/0x280 [ptlrpc]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa078b236>] ? lustre_pack_reply_flags+0xa6/0x1e0 [ptlrpc]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa078b381>] ? lustre_pack_reply+0x11/0x20 [ptlrpc]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa07b1ec3>] ? req_capsule_server_pack+0x53/0x100 [ptlrpc]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa1125bf5>] ? mdt_getxattr+0x635/0x1470 [mdt]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa11042c5>] ? mdt_object_lock_internal+0x65/0x360 [mdt]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa110472c>] ? mdt_intent_getxattr+0x9c/0x150 [mdt]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa05755b6>] ? lu_object_find+0x16/0x20 [obdclass]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa10fbcf4>] ? mdt_intent_policy+0x494/0xce0 [mdt]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa073f4f9>] ? ldlm_lock_enqueue+0x129/0x9d0 [ptlrpc]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa076b46b>] ? ldlm_handle_enqueue0+0x51b/0x13f0 [ptlrpc]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa0422c8a>] ? lc_watchdog_touch+0x7a/0x190 [libcfs]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa07eb921>] ? tgt_enqueue+0x61/0x230 [ptlrpc]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa07ec56e>] ? tgt_request_handle+0x8be/0x1000 [ptlrpc]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa079c5a1>] ? ptlrpc_main+0xe41/0x1960 [ptlrpc]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa079b760>] ? ptlrpc_main+0x0/0x1960 [ptlrpc]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8109e66e>] ? kthread+0x9e/0xc0
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8100c20a>] ? child_rip+0xa/0x20
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8109e5d0>] ? kthread+0x0/0xc0
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: swapper: page allocation failure. order:2, mode:0x20
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-504.8.1.el6_lustre.x86_64 #1
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: Call Trace:
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: <IRQ>  [<ffffffff811347ba>] ? __alloc_pages_nodemask+0x74a/0x8d0
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff811736e2>] ? kmem_getpages+0x62/0x170
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff811742fa>] ? fallback_alloc+0x1ba/0x270
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81173d4f>] ? cache_grow+0x2cf/0x320
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81174079>] ? ____cache_alloc_node+0x99/0x160
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81174cc9>] ? __kmalloc+0x199/0x230
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa0314277>] ? bnx2x_alloc_rx_data+0x47/0x1d0 [bnx2x]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff812a2c88>] ? swiotlb_sync_single+0x28/0xd0
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa0314ea9>] ? bnx2x_rx_int+0xaa9/0x1670 [bnx2x]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81450897>] ? __kfree_skb+0x47/0xa0
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa031250c>] ? bnx2x_free_tx_pkt+0x1cc/0x2e0 [bnx2x]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa03101b2>] ? bnx2x_drain_tx_queues+0x102/0x140 [bnx2x]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8145a2c3>] ? __napi_complete+0x23/0x40
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa0315c8f>] ? bnx2x_poll+0x10f/0x400 [bnx2x]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81462a23>] ? net_rx_action+0x103/0x2f0
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8107d8b1>] ? __do_softirq+0xc1/0x1e0
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff810eaec0>] ? handle_IRQ_event+0x60/0x170
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8100fb55>] ? do_softirq+0x65/0xa0
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8107d765>] ? irq_exit+0x85/0x90
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81534405>] ? do_IRQ+0x75/0xf0
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8100b9d3>] ? ret_from_intr+0x0/0x11
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: <EOI>  [<ffffffff812eaf5e>] ? intel_idle+0xde/0x170
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff812eaf41>] ? intel_idle+0xc1/0x170
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81426517>] ? cpuidle_idle_call+0xa7/0x140
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff815236e7>] ? start_secondary+0x2be/0x301
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: swapper: page allocation failure. order:2, mode:0x20
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-504.8.1.el6_lustre.x86_64 #1
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: Call Trace:
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: <IRQ>  [<ffffffff811347ba>] ? __alloc_pages_nodemask+0x74a/0x8d0
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff814b9133>] ? tcp_v4_do_rcv+0x2e3/0x490
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff811736e2>] ? kmem_getpages+0x62/0x170
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff811742fa>] ? fallback_alloc+0x1ba/0x270
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81173d4f>] ? cache_grow+0x2cf/0x320
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81174079>] ? ____cache_alloc_node+0x99/0x160
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81174cc9>] ? __kmalloc+0x199/0x230
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa0314277>] ? bnx2x_alloc_rx_data+0x47/0x1d0 [bnx2x]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff812a2c88>] ? swiotlb_sync_single+0x28/0xd0
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa0314ea9>] ? bnx2x_rx_int+0xaa9/0x1670 [bnx2x]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa0315c8f>] ? bnx2x_poll+0x10f/0x400 [bnx2x]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81462a23>] ? net_rx_action+0x103/0x2f0
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8107d8b1>] ? __do_softirq+0xc1/0x1e0
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff810eaec0>] ? handle_IRQ_event+0x60/0x170
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8100fb55>] ? do_softirq+0x65/0xa0
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8107d765>] ? irq_exit+0x85/0x90
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81534405>] ? do_IRQ+0x75/0xf0
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8100b9d3>] ? ret_from_intr+0x0/0x11
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: <EOI>  [<ffffffff812eaf5e>] ? intel_idle+0xde/0x170
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff812eaf41>] ? intel_idle+0xc1/0x170
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81426517>] ? cpuidle_idle_call+0xa7/0x140
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff815236e7>] ? start_secondary+0x2be/0x301
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-504.8.1.el6_lustre.x86_64 #1
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: Call Trace:
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: <IRQ>  [<ffffffff811347ba>] ? __alloc_pages_nodemask+0x74a/0x8d0
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff811736e2>] ? kmem_getpages+0x62/0x170
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff811742fa>] ? fallback_alloc+0x1ba/0x270
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81173d4f>] ? cache_grow+0x2cf/0x320
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81174079>] ? ____cache_alloc_node+0x99/0x160
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81174cc9>] ? __kmalloc+0x199/0x230
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa0314277>] ? bnx2x_alloc_rx_data+0x47/0x1d0 [bnx2x]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff812a2c88>] ? swiotlb_sync_single+0x28/0xd0
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa0314ea9>] ? bnx2x_rx_int+0xaa9/0x1670 [bnx2x]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81450897>] ? __kfree_skb+0x47/0xa0
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa031250c>] ? bnx2x_free_tx_pkt+0x1cc/0x2e0 [bnx2x]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa0310262>] ? bnx2x_napi_disable_cnic+0x72/0x120 [bnx2x]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8145a2c3>] ? __napi_complete+0x23/0x40
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffffa0315c8f>] ? bnx2x_poll+0x10f/0x400 [bnx2x]
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81462a23>] ? net_rx_action+0x103/0x2f0
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8107d8b1>] ? __do_softirq+0xc1/0x1e0
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff810eaec0>] ? handle_IRQ_event+0x60/0x170
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8100fb55>] ? do_softirq+0x65/0xa0
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8107d765>] ? irq_exit+0x85/0x90
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81534405>] ? do_IRQ+0x75/0xf0
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff8100b9d3>] ? ret_from_intr+0x0/0x11
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: <EOI>  [<ffffffff812eaf5e>] ? intel_idle+0xde/0x170
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff812eaf41>] ? intel_idle+0xc1/0x170
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81426517>] ? cpuidle_idle_call+0xa7/0x140
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110
Jun  9 14:28:14 cs04r-sc-mds03-02 kernel: [<ffffffff815236e7>] ? start_secondary+0x2be/0x301
Jun  9 14:29:55 cs04r-sc-mds03-02 kernel: LustreError: 8125:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2
Jun  9 14:29:55 cs04r-sc-mds03-02 kernel: LustreError: 8125:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53147 of catalog 0x8:10 rc=-2
Jun  9 14:30:06 cs04r-sc-mds03-02 kernel: LustreError: 8130:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2
Jun  9 14:30:06 cs04r-sc-mds03-02 kernel: LustreError: 8130:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53147 of catalog 0x8:10 rc=-2
Jun  9 14:52:27 cs04r-sc-mds03-02 kernel: LustreError: 8089:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2
Jun  9 14:52:27 cs04r-sc-mds03-02 kernel: LustreError: 8089:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53177 of catalog 0x8:10 rc=-2
Jun  9 14:56:40 cs04r-sc-mds03-02 kernel: LustreError: 8121:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2
Jun  9 14:56:40 cs04r-sc-mds03-02 kernel: LustreError: 8121:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53181 of catalog 0x8:10 rc=-2
Jun  9 15:06:27 cs04r-sc-mds03-02 kernel: LustreError: 8081:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2
Jun  9 15:06:27 cs04r-sc-mds03-02 kernel: LustreError: 8081:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53191 of catalog 0x8:10 rc=-2
Jun  9 15:06:37 cs04r-sc-mds03-02 kernel: LustreError: 8137:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2
Jun  9 15:06:37 cs04r-sc-mds03-02 kernel: LustreError: 8137:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53192 of catalog 0x8:10 rc=-2
Jun  9 15:09:51 cs04r-sc-mds03-02 kernel: LustreError: 8148:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2
Jun  9 15:09:51 cs04r-sc-mds03-02 kernel: LustreError: 8148:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53196 of catalog 0x8:10 rc=-2
Jun  9 15:19:17 cs04r-sc-mds03-02 kernel: LustreError: 8133:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2
Jun  9 15:19:17 cs04r-sc-mds03-02 kernel: LustreError: 8133:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53204 of catalog 0x8:10 rc=-2
Jun  9 15:28:50 cs04r-sc-mds03-02 kernel: LustreError: 8133:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2
Jun  9 15:28:50 cs04r-sc-mds03-02 kernel: LustreError: 8133:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53212 of catalog 0x8:10 rc=-2
Jun  9 15:37:10 cs04r-sc-mds03-02 kernel: LustreError: 8089:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2
Jun  9 15:37:10 cs04r-sc-mds03-02 kernel: LustreError: 8089:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53219 of catalog 0x8:10 rc=-2
Jun  9 15:40:18 cs04r-sc-mds03-02 kernel: LustreError: 8107:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2
Jun  9 15:40:18 cs04r-sc-mds03-02 kernel: LustreError: 8107:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53223 of catalog 0x8:10 rc=-2
Jun  9 15:42:49 cs04r-sc-mds03-02 kernel: LustreError: 7142:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2
Jun  9 15:42:49 cs04r-sc-mds03-02 kernel: LustreError: 7142:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53232 of catalog 0x8:10 rc=-2
Jun  9 15:47:41 cs04r-sc-mds03-02 kernel: LustreError: 7408:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2
Jun  9 15:47:41 cs04r-sc-mds03-02 kernel: LustreError: 7408:0:(llog_cat.c:508:llog_cat_cancel_records()) Skipped 9 previous similar messages
Jun  9 15:47:41 cs04r-sc-mds03-02 kernel: LustreError: 7408:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53263 of catalog 0x8:10 rc=-2
Jun  9 15:47:41 cs04r-sc-mds03-02 kernel: LustreError: 7408:0:(mdd_device.c:260:llog_changelog_cancel()) Skipped 9 previous similar messages
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: swapper: page allocation failure. order:2, mode:0x20
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: swapper: page allocation failure. order:2, mode:0x20
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-504.8.1.el6_lustre.x86_64 #1
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: Call Trace:
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: <IRQ>  [<ffffffff811347ba>] ? __alloc_pages_nodemask+0x74a/0x8d0
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff814b9133>] ? tcp_v4_do_rcv+0x2e3/0x490
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff811736e2>] ? kmem_getpages+0x62/0x170
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff811742fa>] ? fallback_alloc+0x1ba/0x270
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff81173d4f>] ? cache_grow+0x2cf/0x320
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff81174079>] ? ____cache_alloc_node+0x99/0x160
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x]
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff81174cc9>] ? __kmalloc+0x199/0x230
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x]
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffffa0314277>] ? bnx2x_alloc_rx_data+0x47/0x1d0 [bnx2x]
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff812a2c88>] ? swiotlb_sync_single+0x28/0xd0
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffffa0314ea9>] ? bnx2x_rx_int+0xaa9/0x1670 [bnx2x]
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffffa031250c>] ? bnx2x_free_tx_pkt+0x1cc/0x2e0 [bnx2x]
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffffa031055a>] ? bnx2x_free_msix_irqs+0x8a/0x190 [bnx2x]
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff8145a2c3>] ? __napi_complete+0x23/0x40
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffffa0315c8f>] ? bnx2x_poll+0x10f/0x400 [bnx2x]
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff81462a23>] ? net_rx_action+0x103/0x2f0
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff8107d8b1>] ? __do_softirq+0xc1/0x1e0
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff810eaec0>] ? handle_IRQ_event+0x60/0x170
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff8100fb55>] ? do_softirq+0x65/0xa0
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff8107d765>] ? irq_exit+0x85/0x90
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff81534405>] ? do_IRQ+0x75/0xf0
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff8100b9d3>] ? ret_from_intr+0x0/0x11
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: <EOI>  [<ffffffff812eaf5e>] ? intel_idle+0xde/0x170
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff812eaf41>] ? intel_idle+0xc1/0x170
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff81426517>] ? cpuidle_idle_call+0xa7/0x140
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff815236e7>] ? start_secondary+0x2be/0x301
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: swapper: page allocation failure. order:2, mode:0x20
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-504.8.1.el6_lustre.x86_64 #1
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: Call Trace:
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: <IRQ>  [<ffffffff811347ba>] ? __alloc_pages_nodemask+0x74a/0x8d0
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff811736e2>] ? kmem_getpages+0x62/0x170
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff811742fa>] ? fallback_alloc+0x1ba/0x270
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff81173d4f>] ? cache_grow+0x2cf/0x320
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff81174079>] ? ____cache_alloc_node+0x99/0x160
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x]
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff81174cc9>] ? __kmalloc+0x199/0x230
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x]
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffffa0314277>] ? bnx2x_alloc_rx_data+0x47/0x1d0 [bnx2x]
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff812a2c88>] ? swiotlb_sync_single+0x28/0xd0
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffffa0314ea9>] ? bnx2x_rx_int+0xaa9/0x1670 [bnx2x]
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffffa031250c>] ? bnx2x_free_tx_pkt+0x1cc/0x2e0 [bnx2x]
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffffa03102fa>] ? bnx2x_napi_disable_cnic+0x10a/0x120 [bnx2x]
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff8145a2c3>] ? __napi_complete+0x23/0x40
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffffa0315c8f>] ? bnx2x_poll+0x10f/0x400 [bnx2x]
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff810b0760>] ? tick_sched_timer+0x0/0xc0
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff81462a23>] ? net_rx_action+0x103/0x2f0
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff8107d8b1>] ? __do_softirq+0xc1/0x1e0
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff810eaec0>] ? handle_IRQ_event+0x60/0x170
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff8100fb55>] ? do_softirq+0x65/0xa0
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff8107d765>] ? irq_exit+0x85/0x90
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff81534405>] ? do_IRQ+0x75/0xf0
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff8100b9d3>] ? ret_from_intr+0x0/0x11
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: <EOI>  [<ffffffff812eaf5e>] ? intel_idle+0xde/0x170
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff812eaf41>] ? intel_idle+0xc1/0x170
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff81426517>] ? cpuidle_idle_call+0xa7/0x140
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff815236e7>] ? start_secondary+0x2be/0x301
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-504.8.1.el6_lustre.x86_64 #1
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: Call Trace:
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: <IRQ>  [<ffffffff811347ba>] ? __alloc_pages_nodemask+0x74a/0x8d0
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff811736e2>] ? kmem_getpages+0x62/0x170
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff811742fa>] ? fallback_alloc+0x1ba/0x270
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff81173d4f>] ? cache_grow+0x2cf/0x320
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff81174079>] ? ____cache_alloc_node+0x99/0x160
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x]
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff81174cc9>] ? __kmalloc+0x199/0x230
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffffa030fa27>] ? bnx2x_frag_alloc+0x17/0x20 [bnx2x]
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffffa0314277>] ? bnx2x_alloc_rx_data+0x47/0x1d0 [bnx2x]
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff812a2c88>] ? swiotlb_sync_single+0x28/0xd0
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffffa0314ea9>] ? bnx2x_rx_int+0xaa9/0x1670 [bnx2x]
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff8145a2c3>] ? __napi_complete+0x23/0x40
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffffa0315c8f>] ? bnx2x_poll+0x10f/0x400 [bnx2x]
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff81462a23>] ? net_rx_action+0x103/0x2f0
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff8107d8b1>] ? __do_softirq+0xc1/0x1e0
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff810eaec0>] ? handle_IRQ_event+0x60/0x170
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff8100fb55>] ? do_softirq+0x65/0xa0
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff8107d765>] ? irq_exit+0x85/0x90
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff81534405>] ? do_IRQ+0x75/0xf0
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff8100b9d3>] ? ret_from_intr+0x0/0x11
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: <EOI>  [<ffffffff812eaf5e>] ? intel_idle+0xde/0x170
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff812eaf41>] ? intel_idle+0xc1/0x170
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff81426517>] ? cpuidle_idle_call+0xa7/0x140
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110
Jun  9 16:00:44 cs04r-sc-mds03-02 kernel: [<ffffffff815236e7>] ? start_secondary+0x2be/0x301
Jun  9 16:01:46 cs04r-sc-mds03-02 kernel: LustreError: 8137:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2
Jun  9 16:01:46 cs04r-sc-mds03-02 kernel: LustreError: 8137:0:(llog_cat.c:508:llog_cat_cancel_records()) Skipped 14 previous similar messages
Jun  9 16:01:46 cs04r-sc-mds03-02 kernel: LustreError: 8137:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53286 of catalog 0x8:10 rc=-2
Jun  9 16:01:46 cs04r-sc-mds03-02 kernel: LustreError: 8137:0:(mdd_device.c:260:llog_changelog_cancel()) Skipped 14 previous similar messages
Jun  9 16:18:57 cs04r-sc-mds03-02 kernel: LustreError: 8081:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2
Jun  9 16:18:57 cs04r-sc-mds03-02 kernel: LustreError: 8081:0:(llog_cat.c:508:llog_cat_cancel_records()) Skipped 3 previous similar messages
Jun  9 16:18:57 cs04r-sc-mds03-02 kernel: LustreError: 8081:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53302 of catalog 0x8:10 rc=-2
Jun  9 16:18:57 cs04r-sc-mds03-02 kernel: LustreError: 8081:0:(mdd_device.c:260:llog_changelog_cancel()) Skipped 3 previous similar messages
Jun  9 16:40:16 cs04r-sc-mds03-02 kernel: LustreError: 8133:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2
Jun  9 16:40:16 cs04r-sc-mds03-02 kernel: LustreError: 8133:0:(llog_cat.c:508:llog_cat_cancel_records()) Skipped 2 previous similar messages
Jun  9 16:40:16 cs04r-sc-mds03-02 kernel: LustreError: 8133:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53323 of catalog 0x8:10 rc=-2
Jun  9 16:40:16 cs04r-sc-mds03-02 kernel: LustreError: 8133:0:(mdd_device.c:260:llog_changelog_cancel()) Skipped 2 previous similar messages
Jun  9 16:41:41 cs04r-sc-mds03-02 kernel: LustreError: 7408:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2
Jun  9 16:41:41 cs04r-sc-mds03-02 kernel: LustreError: 7408:0:(llog_cat.c:508:llog_cat_cancel_records()) Skipped 2 previous similar messages
Jun  9 16:41:41 cs04r-sc-mds03-02 kernel: LustreError: 7408:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53326 of catalog 0x8:10 rc=-2
Jun  9 16:41:41 cs04r-sc-mds03-02 kernel: LustreError: 7408:0:(mdd_device.c:260:llog_changelog_cancel()) Skipped 2 previous similar messages
Jun  9 16:45:02 cs04r-sc-mds03-02 kernel: LustreError: 8118:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2
Jun  9 16:45:02 cs04r-sc-mds03-02 kernel: LustreError: 8118:0:(llog_cat.c:508:llog_cat_cancel_records()) Skipped 1 previous similar message
Jun  9 16:45:02 cs04r-sc-mds03-02 kernel: LustreError: 8118:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53331 of catalog 0x8:10 rc=-2
Jun  9 16:45:02 cs04r-sc-mds03-02 kernel: LustreError: 8118:0:(mdd_device.c:260:llog_changelog_cancel()) Skipped 1 previous similar message
Jun  9 16:50:09 cs04r-sc-mds03-02 kernel: LustreError: 7548:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2
Jun  9 16:50:09 cs04r-sc-mds03-02 kernel: LustreError: 7548:0:(llog_cat.c:508:llog_cat_cancel_records()) Skipped 3 previous similar messages
Jun  9 16:50:09 cs04r-sc-mds03-02 kernel: LustreError: 7548:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 53340 of catalog 0x8:10 rc=-2
Jun  9 16:50:09 cs04r-sc-mds03-02 kernel: LustreError: 7548:0:(mdd_device.c:260:llog_changelog_cancel()) Skipped 3 previous similar messages

The actual failure:

Jun  9 16:55:05 cs04r-sc-mds03-02 kernel: LustreError: 8140:0:(llog_cat.c:163:llog_cat_id2handle()) lustre03-MDD0000: error opening log id 0x11564:1:0: rc = -2
Jun  9 16:55:05 cs04r-sc-mds03-02 kernel: LustreError: 8140:0:(llog_cat.c:537:llog_cat_process_cb()) lustre03-MDD0000: cannot find handle for llog 0x11564:1: -2
Jun  9 16:56:29 cs04r-sc-mds03-02 kernel: LustreError: 7605:0:(osd_handler.c:2530:osd_object_destroy()) ASSERTION(!lu_object_is_dying(dt->do_lu.lo_header) ) failed:
Jun  9 16:56:29 cs04r-sc-mds03-02 kernel: LustreError: 7605:0:(osd_handler.c:2530:osd_object_destroy()) LBUG
Jun  9 16:56:29 cs04r-sc-mds03-02 kernel: Pid: 7605, comm: mdt02_006
Jun  9 16:56:29 cs04r-sc-mds03-02 kernel:
Jun  9 16:56:29 cs04r-sc-mds03-02 kernel: Call Trace:
Jun  9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa0410895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
Jun  9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa0410e97>] lbug_with_loc+0x47/0xb0 [libcfs]
Jun  9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa0ebc011>] osd_object_destroy+0x3b1/0x460 [osd_ldiskfs]
Jun  9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa0536cf4>] llog_osd_destroy+0x5d4/0xd40 [obdclass]
Jun  9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa052d3f1>] llog_destroy+0x51/0x170 [obdclass]
Jun  9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa052f0c8>] llog_cat_process_cb+0x3a8/0x5f0 [obdclass]
Jun  9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa052a5a9>] llog_process_thread+0xaa9/0xe80 [obdclass]
Jun  9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa052ed20>] ? llog_cat_process_cb+0x0/0x5f0 [obdclass]
Jun  9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa052aabf>] llog_process_or_fork+0x13f/0x540 [obdclass]
Jun  9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa052dc5d>] llog_cat_process_or_fork+0x1ad/0x300 [obdclass]
Jun  9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa1242f50>] ? llog_changelog_cancel_cb+0x0/0x1d0 [mdd]
Jun  9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa052ddc9>] llog_cat_process+0x19/0x20 [obdclass]
Jun  9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa1242d7f>] llog_changelog_cancel+0x5f/0x230 [mdd]
Jun  9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa04211c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
Jun  9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa0530df8>] llog_cancel+0x58/0x240 [obdclass]
Jun  9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa12491aa>] mdd_changelog_user_purge+0x46a/0x6f0 [mdd]
Jun  9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa1249a8c>] mdd_iocontrol+0x65c/0xb70 [mdd]
Jun  9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa10f8119>] mdt_ioc_child+0x149/0x1d0 [mdt]
Jun  9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa1104c4b>] mdt_iocontrol+0x2fb/0x8e0 [mdt]
Jun  9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa11054b1>] mdt_set_info+0x281/0x430 [mdt]
Jun  9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa078b381>] ? lustre_pack_reply+0x11/0x20 [ptlrpc]
Jun  9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa07ec56e>] tgt_request_handle+0x8be/0x1000 [ptlrpc]
Jun  9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa079c5a1>] ptlrpc_main+0xe41/0x1960 [ptlrpc]
Jun  9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffff8106c4f0>] ? pick_next_task_fair+0xd0/0x130
Jun  9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffffa079b760>] ? ptlrpc_main+0x0/0x1960 [ptlrpc]
Jun  9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffff8109e66e>] kthread+0x9e/0xc0
Jun  9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20
Jun  9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffff8109e5d0>] ? kthread+0x0/0xc0
Jun  9 16:56:29 cs04r-sc-mds03-02 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
Jun  9 16:56:29 cs04r-sc-mds03-02 kernel: 
Comment by Andreas Dilger [ 10/Jun/15 ]

What version of Lustre did you upgrade from?

Comment by Andreas Dilger [ 10/Jun/15 ]

Also, is this hitting repeatedly, or did it go away when the system was restarted?

Mike, it looks like this is related to llog handling during ChangeLog processing. Is it possible there is a race with multiple threads cancelling the same records? In any case, there shouldn't be an LASSERT() when deleting a log file if the file is already being deleted?

Comment by Dave Bond (Inactive) [ 11/Jun/15 ]

This system was upgraded from 2.5, I have only seen this once. Though we did have an issue with an MDS not responding after a minor network outage. But I do not have a compelling set of logs to suggest they are related.

I will put the console output below

Jun  8 13:27:57 cs04r-sc-mds03-01 kernel: LNet: There was an unexpected network error while writing to 172.23.148.22: -110.
Jun  8 13:30:32 cs04r-sc-mds03-01 kernel: Lustre: MGS: haven't heard from client de0451fe-3e87-4bcb-2ca6-d2af988671be (at 172.23.148.35@tcp) in 227 seconds. I think it's dead, and I am evicting it. exp ffff881fcf0bcc00, cur 1433766632 expire 1433766482 last 1433766405
Jun  8 13:30:32 cs04r-sc-mds03-01 kernel: Lustre: Skipped 1 previous similar message
Jun  8 13:30:48 cs04r-sc-mds03-01 kernel: Lustre: lustre03-MDT0000: Client bb255a22-f3c1-835b-8049-eab34c95ba65 (at 172.23.148.64@tcp) reconnecting
Jun  8 13:30:59 cs04r-sc-mds03-01 kernel: Lustre: lustre03-MDT0000: Client db5a1353-f37b-fe0a-ccf8-9bc50f7a62ad (at 172.23.148.65@tcp) reconnecting
Jun  8 13:31:03 cs04r-sc-mds03-01 kernel: Lustre: MGS: Client b85575c0-8d63-0c39-a18e-c25179bf68dd (at 172.23.148.26@tcp) reconnecting
Jun  8 13:31:08 cs04r-sc-mds03-01 kernel: Lustre: MGS: Client 0e2a3416-2996-0da3-aab5-16ab1d68433f (at 172.23.148.24@tcp) reconnecting
Jun  8 13:31:08 cs04r-sc-mds03-01 kernel: Lustre: Skipped 2 previous similar messages
Jun  8 13:31:38 cs04r-sc-mds03-01 kernel: Lustre: MGS: Client 8945eb8e-242f-a306-9ce7-98c47b58cd6c (at 172.23.148.38@tcp) reconnecting
Jun  8 13:31:38 cs04r-sc-mds03-01 kernel: Lustre: Skipped 1 previous similar message
Jun  8 13:38:59 cs04r-sc-mds03-01 kernel: LustreError: 20218:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2
Jun  8 13:38:59 cs04r-sc-mds03-01 kernel: LustreError: 20218:0:(llog_cat.c:508:llog_cat_cancel_records()) Skipped 18 previous similar messages
Jun  8 13:38:59 cs04r-sc-mds03-01 kernel: LustreError: 20218:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 52222 of catalog 0x8:10 rc=-2
Jun  8 13:38:59 cs04r-sc-mds03-01 kernel: LustreError: 20218:0:(mdd_device.c:260:llog_changelog_cancel()) Skipped 18 previous similar messages
Jun  8 13:49:04 cs04r-sc-mds03-01 kernel: LustreError: 18959:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2
Jun  8 13:49:04 cs04r-sc-mds03-01 kernel: LustreError: 18959:0:(llog_cat.c:508:llog_cat_cancel_records()) Skipped 17 previous similar messages
Jun  8 13:49:04 cs04r-sc-mds03-01 kernel: LustreError: 18959:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 52247 of catalog 0x8:10 rc=-2
Jun  8 13:49:04 cs04r-sc-mds03-01 kernel: LustreError: 18959:0:(mdd_device.c:260:llog_changelog_cancel()) Skipped 17 previous similar messages
Jun  8 14:02:30 cs04r-sc-mds03-01 kernel: LustreError: 18965:0:(llog_cat.c:508:llog_cat_cancel_records()) lustre03-MDD0000: fail to cancel 0 of 1 llog-records: rc = -2
Jun  8 14:02:30 cs04r-sc-mds03-01 kernel: LustreError: 18965:0:(llog_cat.c:508:llog_cat_cancel_records()) Skipped 25 previous similar messages
Jun  8 14:02:30 cs04r-sc-mds03-01 kernel: LustreError: 18965:0:(mdd_device.c:260:llog_changelog_cancel()) lustre03-MDD0000: cancel idx 52272 of catalog 0x8:10 rc=-2
Jun  8 14:02:30 cs04r-sc-mds03-01 kernel: LustreError: 18965:0:(mdd_device.c:260:llog_changelog_cancel()) Skipped 25 previous similar messages
Jun  8 14:04:33 cs04r-sc-mds03-01 kernel: LustreError: 19000:0:(llog_cat.c:163:llog_cat_id2handle()) lustre03-MDD0000: error opening log id 0x10f8e:1:0: rc = -2
Jun  8 14:04:33 cs04r-sc-mds03-01 kernel: LustreError: 19000:0:(llog_cat.c:537:llog_cat_process_cb()) lustre03-MDD0000: cannot find handle for llog 0x10f8e:1: -2
Comment by nasf (Inactive) [ 09/Feb/16 ]

[delete unrelated test failure]

Comment by Alex Zhuravlev [ 09/Feb/16 ]

[delete unrelated test failure]

Comment by Alexander Zarochentsev [ 10/Aug/16 ]

Mike, we had the following fix for Lustre-2.1 :

MRP-1443 llog: avoid llog cancel race
    
    Concurrently running two or more lfs changelog_clear need to be
    protected against races. llog_process_thread() used to read llogs
    without taking into account that the llog being read may be destroyed
    by another process. This patch serializes changelog cancellings using
    llog_ctxt's mutex.

diff --git a/lustre/mdd/mdd_device.c b/lustre/mdd/mdd_device.c
index 1642f0a..8140208 100644
--- a/lustre/mdd/mdd_device.c
+++ b/lustre/mdd/mdd_device.c
@@ -386,6 +386,7 @@ int mdd_changelog_llog_cancel(const struct lu_env *env,
         if (ctxt == NULL)
                 return -ENXIO;
 
+        cfs_mutex_lock(&ctxt->loc_mutex);
         cfs_spin_lock(&mdd->mdd_cl.mc_lock);
         cur = (long long)mdd->mdd_cl.mc_index;
         cfs_spin_unlock(&mdd->mdd_cl.mc_lock);
@@ -413,6 +414,7 @@ int mdd_changelog_llog_cancel(const struct lu_env *env,
 
         rc = llog_cancel(ctxt, NULL, 1, (struct llog_cookie *)&endrec, 0);
 out:
+        cfs_mutex_unlock(&ctxt->loc_mutex);
         llog_ctxt_put(ctxt);
         return rc;
 }

do you think it might be useful for 2.7+ ?

Comment by Rahul Deshmukh (Inactive) [ 11/Aug/16 ]

Created the bug https://jira.hpdd.intel.com/browse/LU-8496 (Race is changelog clear path). The assertion is different but seems that the path is same, so please review.

Comment by Mikhail Pershin [ 22/Jul/18 ]

Outdated issue, there were several patches landed to fix llog races and issues and this issue may be fixed already. Reopen if will appear again

Generated at Sat Feb 10 02:02:28 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.