Details
-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
None
-
Lustre 2.1.2
-
None
-
Client RHEL-6.2 2.6.32-220.23.1.el6.x86_64
-
3
-
4210
Description
LBUG on production cluster while node was running a user job.
Aug 8 16:00:02 sand-4-52 kernel: LustreError: 64082:0:(osc_page.c:425:osc_page_delete()) page@ffff88106917b5c0[1 ffff8806726e8f48:1263 ^(null)_ffff88106917b500 4 0 1 (null) (null) 0x1]
Aug 8 16:00:02 sand-4-52 kernel: LustreError: 64082:0:(osc_page.c:425:osc_page_delete()) page@ffff88106917b500[1 ffff880770b2c4c8:1263 ^ffff88106917b5c0_(null) 4 0 1 (null) (null) 0x0]
Aug 8 16:00:02 sand-4-52 kernel: LustreError: 64082:0:(osc_page.c:425:osc_page_delete()) vvp-page@ffff8809d88c6be0(1:0:0) vm@ffffea0034f6ce20 c0000000000001 3:0 0 1263 lru
Aug 8 16:00:02 sand-4-52 kernel: LustreError: 64082:0:(osc_page.c:425:osc_page_delete()) lov-page@ffff8809f22e93a8
Aug 8 16:00:02 sand-4-52 kernel: LustreError: 64082:0:(osc_page.c:425:osc_page_delete()) osc-page@ffff881067d54bc8: 1< 0x845fed 1 0 - - + > 2< 5173248 0 4096 0x0 0x8 | (null) ffff8806d7f24688 ffff88016958f700 ffffffffa07ff>
Aug 8 16:00:02 sand-4-52 kernel: LustreError: 64082:0:(osc_page.c:425:osc_page_delete()) end page@ffff88106917b5c0
Aug 8 16:00:02 sand-4-52 kernel: LustreError: 64082:0:(osc_page.c:425:osc_page_delete()) Trying to teardown failed: -16
Aug 8 16:00:02 sand-4-52 kernel: LustreError: 64082:0:(osc_page.c:426:osc_page_delete()) ASSERTION(0) failed
Aug 8 16:00:02 sand-4-52 kernel: LustreError: 64082:0:(osc_page.c:426:osc_page_delete()) LBUG
Aug 8 16:00:02 sand-4-52 kernel: Pid: 64082, comm: calculate_propa
Aug 8 16:00:02 sand-4-52 kernel:
Aug 8 16:00:02 sand-4-52 kernel: Call Trace:
Aug 8 16:00:02 sand-4-52 kernel: [<ffffffffa0446855>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
Aug 8 16:00:02 sand-4-52 kernel: [<ffffffffa0446e95>] lbug_with_loc+0x75/0xe0 [libcfs]
Aug 8 16:00:02 sand-4-52 kernel: [<ffffffffa0451d86>] libcfs_assertion_failed+0x66/0x70 [libcfs]
Aug 8 16:00:02 sand-4-52 kernel: [<ffffffffa07f8df6>] osc_page_delete+0x236/0x240 [osc]
Aug 8 16:00:02 sand-4-52 kernel: [<ffffffffa0553d1e>] cl_page_delete0+0xce/0x400 [obdclass]
Aug 8 16:00:02 sand-4-52 kernel: [<ffffffffa055128e>] ? cl_env_get+0x19e/0x350 [obdclass]
Aug 8 16:00:02 sand-4-52 kernel: [<ffffffffa0550bf6>] ? cl_env_peek+0x36/0x110 [obdclass]
Aug 8 16:00:02 sand-4-52 kernel: [<ffffffffa055408d>] cl_page_delete+0x3d/0xf0 [obdclass]
Aug 8 16:00:02 sand-4-52 kernel: [<ffffffffa0560bde>] ? cl_io_is_going+0xe/0x20 [obdclass]
Aug 8 16:00:02 sand-4-52 kernel: [<ffffffffa08e791b>] ll_releasepage+0x10b/0x150 [lustre]
Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff81168ff0>] ? mem_cgroup_uncharge_cache_page+0x10/0x20
Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff811100b0>] try_to_release_page+0x30/0x60
Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff8112a4f1>] shrink_page_list.clone.0+0x4f1/0x5c0
Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff8112a8bb>] shrink_inactive_list+0x2fb/0x740
Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff8112b5cf>] shrink_zone+0x38f/0x520
Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff8112c374>] zone_reclaim+0x354/0x410
Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff8112cfc0>] ? isolate_pages_global+0x0/0x350
Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff81122874>] get_page_from_freelist+0x694/0x820
Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff81123af1>] __alloc_pages_nodemask+0x111/0x940
Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff8116a728>] ? __mem_cgroup_try_charge+0x78/0x420
Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff811586ca>] alloc_pages_vma+0x9a/0x150
Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff81172015>] do_huge_pmd_anonymous_page+0x145/0x370
Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff8113c79a>] handle_mm_fault+0x25a/0x2b0
Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff81042c29>] __do_page_fault+0x139/0x480
Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff8100988e>] ? __switch_to+0x26e/0x320
Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff814ed250>] ? thread_return+0x4e/0x76e
Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff814f2c8e>] do_page_fault+0x3e/0xa0
Aug 8 16:00:03 sand-4-52 kernel: [<ffffffff814f0045>] page_fault+0x25/0x30
Aug 8 16:00:03 sand-4-52 kernel:
Aug 8 16:00:03 sand-4-52 kernel: LustreError: dumping log to /tmp/lustre-log.1344438003.64082
OK. Thank you. It just looks a lot alike.
I will keep reading code.
2012-11-06T14:57:10.872333-06:00 c0-0c1s6n0 LustreError: 5270:0:(osc_cache.c:2367:osc_teardown_async_page()) extent ffff88060eedfe58@
{[23 -> 23/255], [2|0|-|cache|wi|ffff88020e18f8c8], [4096|1|+|-|ffff8801f9ed9c18|256| (null)]}trunc at 23.
2012-11-06T14:57:10.872389-06:00 c0-0c1s6n0 LustreError: 5270:0:(osc_page.c:411:osc_page_delete()) page@ffff88080e8c8bc0[2 ffff88020cefad08:23 ^ (null)_ffff88080e8c8b00 4 0 1 (null) (null) 0x0]
2012-11-06T14:57:10.902445-06:00 c0-0c1s6n0 LustreError: 5270:0:(osc_page.c:411:osc_page_delete()) page@ffff88080e8c8b00[1 ffff88020d8c1f58:23 ^ffff88080e8c8bc0_ (null) 4 0 1 (null) (null) 0x0]
2012-11-06T14:57:10.902494-06:00 c0-0c1s6n0 LustreError: 5270:0:(osc_page.c:411:osc_page_delete()) vvp-page@ffff88080e8cf5a0(0:0:0) vm@ffffea001c6a99b8 e00000000000063 7:0 0 23 lru
2012-11-06T14:57:10.902508-06:00 c0-0c1s6n0 LustreError: 5270:0:(osc_page.c:411:osc_page_delete()) lov-page@ffff8808050a7888
2012-11-06T14:57:10.958005-06:00 c0-0c1s6n0 LustreError: 5270:0:(osc_page.c:411:osc_page_delete()) osc-page@ffff88080d78ed20: 1< 0x845fed 258 0 + - > 2< 94208 0 4096 0x0 0x520 | (null) ffff880804350700 ffff88020e18f8c8 > 3< + ffff8801f8f577b0 0 0 0 > 4< 0 0 8 33824768 - | - - + - > 5< - - + - | 0 - | 1 - ->
2012-11-06T14:57:10.958049-06:00 c0-0c1s6n0 LustreError: 5270:0:(osc_page.c:411:osc_page_delete()) end page@ffff88080e8c8bc0
2012-11-06T14:57:10.983504-06:00 c0-0c1s6n0 LustreError: 5270:0:(osc_page.c:411:osc_page_delete()) Trying to teardown failed: -16
2012-11-06T14:57:10.983536-06:00 c0-0c1s6n0 LustreError: 5270:0:(osc_page.c:412:osc_page_delete()) ASSERTION( 0 ) failed:
2012-11-06T14:57:10.983549-06:00 c0-0c1s6n0 LustreError: 5270:0:(osc_page.c:412:osc_page_delete()) LBUG
2012-11-06T14:57:10.983570-06:00 c0-0c1s6n0 Pid: 5270, comm: fsx-linux
2012-11-06T14:57:10.983582-06:00 c0-0c1s6n0 Call Trace:
2012-11-06T14:57:10.983605-06:00 c0-0c1s6n0 [<ffffffff810063b1>] try_stack_unwind+0x161/0x1a0
2012-11-06T14:57:11.009114-06:00 c0-0c1s6n0 [<ffffffff81004bf9>] dump_trace+0x89/0x440
2012-11-06T14:57:11.009138-06:00 c0-0c1s6n0 [<ffffffffa014e887>] libcfs_debug_dumpstack+0x57/0x80 [libcfs]
2012-11-06T14:57:11.009161-06:00 c0-0c1s6n0 [<ffffffffa014ede7>] lbug_with_loc+0x47/0xc0 [libcfs]
2012-11-06T14:57:11.009173-06:00 c0-0c1s6n0 [<ffffffffa0671d41>] osc_page_delete+0x2d1/0x2e0 [osc]
2012-11-06T14:57:11.034700-06:00 c0-0c1s6n0 [<ffffffffa02b0095>] cl_page_delete0+0xd5/0x4e0 [obdclass]
2012-11-06T14:57:11.034737-06:00 c0-0c1s6n0 [<ffffffffa02b04e2>] cl_page_delete+0x42/0x120 [obdclass]
2012-11-06T14:57:11.034762-06:00 c0-0c1s6n0 [<ffffffffa07f2e2d>] ll_invalidatepage+0x8d/0x170 [lustre]
2012-11-06T14:57:11.034774-06:00 c0-0c1s6n0 [<ffffffffa07ea290>] ll_page_mkwrite+0x7c0/0x840 [lustre]
2012-11-06T14:57:11.034820-06:00 c0-0c1s6n0 [<ffffffff81107cb7>] __do_fault+0xe7/0x570
2012-11-06T14:57:11.034833-06:00 c0-0c1s6n0 [<ffffffff811081e4>] handle_pte_fault+0xa4/0xcd0
2012-11-06T14:57:11.060561-06:00 c0-0c1s6n0 [<ffffffff81108fbe>] handle_mm_fault+0x1ae/0x240
2012-11-06T14:57:11.060588-06:00 c0-0c1s6n0 [<ffffffff81025471>] do_page_fault+0x191/0x410
2012-11-06T14:57:11.060600-06:00 c0-0c1s6n0 [<ffffffff81301b5f>] page_fault+0x1f/0x30
2012-11-06T14:57:11.060647-06:00 c0-0c1s6n0 [<00000000200422b3>] 0x200422b3
2012-11-06T14:57:11.060660-06:00 c0-0c1s6n0 Kernel panic - not syncing: LBUG