[LU-1723] osc_page_delete()) ASSERTION(0) failed Created: 08/Aug/12 Updated: 17/Jul/17 Resolved: 29/May/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.2 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Wojciech Turek (Inactive) | Assignee: | Jinshan Xiong (Inactive) |
| Resolution: | Cannot Reproduce | Votes: | 2 |
| Labels: | None | ||
| Environment: |
Client RHEL-6.2 2.6.32-220.23.1.el6.x86_64 |
||
| Attachments: |
|
||||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 4210 | ||||
| Description |
|
LBUG on production cluster while node was running a user job. Aug 8 16:00:02 sand-4-52 kernel: LustreError: 64082:0:(osc_page.c:425:osc_page_delete()) page@ffff88106917b5c0[1 ffff8806726e8f48:1263 ^(null)_ffff88106917b500 4 0 1 (null) (null) 0x1] |
| Comments |
| Comment by Jinshan Xiong (Inactive) [ 08/Aug/12 ] |
|
Can you please tell me what exactly version you installed on the client node? Also, this patch may help: " |
| Comment by Wojciech Turek (Inactive) [ 08/Aug/12 ] |
|
The clients are running lustre-2.1.2 lustre-client-2.1.2-2.6.32_220.23.1.el6.x86_64.x86_64 Lustre-2.1.2 changelog suggests that it already has a fix for |
| Comment by Jinshan Xiong (Inactive) [ 16/Aug/12 ] |
|
Hi Wojciech Turek, can you please post osc_page_print() function from your source code here? Also which pages have been applied. Thanks in advance. |
| Comment by Wojciech Turek (Inactive) [ 17/Aug/12 ] |
static int osc_page_print(const struct lu_env *env, const struct cl_page_slice *slice, void *cookie, lu_printer_t printer) { struct osc_page *opg = cl2osc_page(slice); struct osc_async_page *oap = &opg->ops_oap; struct osc_object *obj = cl2osc(slice->cpl_obj); struct client_obd *cli = &osc_export(obj)->exp_obd->u.cli; struct lov_oinfo *loi = obj->oo_oinfo; return (*printer)(env, cookie, LUSTRE_OSC_NAME"-page@%p: " "1< %#x %d %u %s %s %s > " "2< "LPU64" %u %u %#x %#x | %p %p %p %p %p > " "3< %s %p %d %lu %d > " "4< %d %d %d %lu %s | %s %s %s %s > " "5< %s %s %s %s | %d %s %s | %d %s %s>\n", opg, /* 1 */ oap->oap_magic, oap->oap_cmd, oap->oap_interrupted, osc_list(&oap->oap_pending_item), osc_list(&oap->oap_urgent_item), osc_list(&oap->oap_rpc_item), /* 2 */ oap->oap_obj_off, oap->oap_page_off, oap->oap_count, oap->oap_async_flags, oap->oap_brw_flags, oap->oap_request, oap->oap_cli, oap->oap_loi, oap->oap_caller_ops, oap->oap_caller_data, /* 3 */ osc_list(&opg->ops_inflight), opg->ops_submitter, opg->ops_transfer_pinned, osc_submit_duration(opg), opg->ops_srvlock, /* 4 */ cli->cl_r_in_flight, cli->cl_w_in_flight, cli->cl_max_rpcs_in_flight, cli->cl_avail_grant, osc_list(&cli->cl_cache_waiters), osc_list(&cli->cl_loi_ready_list), osc_list(&cli->cl_loi_hp_ready_list), osc_list(&cli->cl_loi_write_list), osc_list(&cli->cl_loi_read_list), /* 5 */ osc_list(&loi->loi_ready_item), osc_list(&loi->loi_hp_ready_item), osc_list(&loi->loi_write_item), osc_list(&loi->loi_read_item), loi->loi_read_lop.lop_num_pending, osc_list(&loi->loi_read_lop.lop_pending), osc_list(&loi->loi_read_lop.lop_urgent), loi->loi_write_lop.lop_num_pending, osc_list(&loi->loi_write_lop.lop_pending), osc_list(&loi->loi_write_lop.lop_urgent)); } |
| Comment by Jinshan Xiong (Inactive) [ 17/Aug/12 ] |
|
Hi Wojciech Turek, Thanks. It looks like the code mismatches the output from your first comment. BTW, you can use: {code} /* C code */ {code} to quote source code on jira. |
| Comment by Wojciech Turek (Inactive) [ 17/Aug/12 ] |
|
Hi Jinshan, I don't understand why you say it mismatches, can you elaborate please? |
| Comment by Jinshan Xiong (Inactive) [ 17/Aug/12 ] |
|
Maybe the output was truncated, from the output you posted: Aug 8 16:00:02 sand-4-52 kernel: LustreError: 64082:0:(osc_page.c:425:osc_page_delete()) osc-page@ffff881067d54bc8: 1< 0x845fed 1 0 - - + > 2< 5173248 0 4096 0x0 0x8 | (null) ffff8806d7f24688 ffff88016958f700 ffffffffa07ff> only paragraph 1 <...> and 2 <...> were printed, but I expect to see all 5 paragraphs. How easy you can reproduce this problem, if possible can you please trigger it and post the output on jira again? |
| Comment by Oleg Drokin [ 13/Oct/12 ] |
|
I just hit this running racer on a small local fs (That gets ENOSPC easily): [12126.733752] LustreError: 11222:0:(vvp_io.c:1038:vvp_io_commit_write()) Write page 0 of inode ffff88004431db20 failed -28
[12126.734303] LustreError: 11222:0:(vvp_io.c:1038:vvp_io_commit_write()) Skipped 708 previous similar messages
[12198.367486] LustreError: 26187:0:(osc_cache.c:2343:osc_teardown_async_page()) extent ffff88002223ef38@{[256 -> 402/511], [2|0|-|cache|wi|ffff880088af7eb0], [602112|147|+|-|ffff880032770ed8|256|(null)]} trunc at 256.
[12198.368328] LustreError: 26187:0:(osc_page.c:411:osc_page_delete()) page@ffff880011a6def0[2 ffff8800169d7f88:256 ^(null)_ffff8800868dfef0 4 0 1 (null) (null) 0x0]
[12198.369065] LustreError: 26187:0:(osc_page.c:411:osc_page_delete()) page@ffff8800868dfef0[1 ffff88004b855f98:256 ^ffff880011a6def0_(null) 4 0 1 (null) (null) 0x0]
[12198.369829] LustreError: 26187:0:(osc_page.c:411:osc_page_delete()) vvp-page@ffff88009f775af8(0:0:0) vm@ffffea00011fc5f8 20000000000035 2:0 0 256 lru
[12198.370386] LustreError: 26187:0:(osc_page.c:411:osc_page_delete()) lov-page@ffff880010cfbae8
[12198.370856] LustreError: 26187:0:(osc_page.c:411:osc_page_delete()) osc-page@ffff88008e13ef08: 1< 0x845fed 258 0 + - > 2< 1048576 0 4096 0x0 0x520 | (null) ffff88006f5c8cb0 ffff880088af7eb0 > 3< + ffff880051112540 0 0 0 > 4< 0 0 8 35164160 - | - - + - > 5< - - + - | 0 - | 147 - ->
[12198.371911] LustreError: 26187:0:(osc_page.c:411:osc_page_delete()) end page@ffff880011a6def0
[12198.372377] LustreError: 26187:0:(osc_page.c:411:osc_page_delete()) Trying to teardown failed: -16
[12198.372850] LustreError: 26187:0:(osc_page.c:412:osc_page_delete()) ASSERTION( 0 ) failed:
[12198.373329] LustreError: 26187:0:(osc_page.c:412:osc_page_delete()) LBUG
[12198.373602] Pid: 26187, comm: ldlm_bl_21
[12198.373826]
[12198.373827] Call Trace:
[12198.374850] [<ffffffffa0caa915>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
[12198.375155] [<ffffffffa0caaf27>] lbug_with_loc+0x47/0xb0 [libcfs]
[12198.375436] [<ffffffffa04cec1d>] osc_page_delete+0x29d/0x2a0 [osc]
[12198.375752] [<ffffffffa11d8e65>] cl_page_delete0+0xd5/0x500 [obdclass]
[12198.376052] [<ffffffffa11d92d2>] cl_page_delete+0x42/0x120 [obdclass]
[12198.376357] [<ffffffffa0ac37cd>] ll_invalidatepage+0x8d/0x180 [lustre]
[12198.376692] [<ffffffffa0ad592f>] vvp_page_discard+0x9f/0x140 [lustre]
[12198.376989] [<ffffffffa11d6358>] cl_page_invoid+0x78/0x170 [obdclass]
[12198.377307] [<ffffffffa11d68a8>] ? cl_page_invoke+0x158/0x1c0 [obdclass]
[12198.377607] [<ffffffffa11d6463>] cl_page_discard+0x13/0x20 [obdclass]
[12198.377903] [<ffffffffa11e08ae>] check_and_discard_cb+0x13e/0x160 [obdclass]
[12198.378220] [<ffffffffa11db54c>] cl_page_gang_lookup+0x1fc/0x410 [obdclass]
[12198.378527] [<ffffffffa11e0770>] ? check_and_discard_cb+0x0/0x160 [obdclass]
[12198.378834] [<ffffffffa11e0770>] ? check_and_discard_cb+0x0/0x160 [obdclass]
[12198.379144] [<ffffffffa11de03e>] cl_lock_discard_pages+0x11e/0x1f0 [obdclass]
[12198.379607] [<ffffffffa11d33d5>] ? cl_env_info+0x15/0x20 [obdclass]
[12198.379888] [<ffffffffa04cfb80>] osc_lock_flush+0x110/0x200 [osc]
[12198.380167] [<ffffffffa04cfcc9>] osc_lock_cancel+0x59/0x1a0 [osc]
[12198.380454] [<ffffffffa11dbda5>] cl_lock_cancel0+0x75/0x160 [obdclass]
[12198.380752] [<ffffffffa11dca0b>] cl_lock_cancel+0x13b/0x140 [obdclass]
[12198.381045] [<ffffffffa04d19ba>] osc_ldlm_blocking_ast+0x13a/0x380 [osc]
[12198.381368] [<ffffffffa12dbaf3>] ldlm_handle_bl_callback+0x123/0x2e0 [ptlrpc]
[12198.381831] [<ffffffffa12dbf31>] ldlm_bl_thread_main+0x281/0x3e0 [ptlrpc]
[12198.382119] [<ffffffff81057d60>] ? default_wake_function+0x0/0x20
[12198.382408] [<ffffffffa12dbcb0>] ? ldlm_bl_thread_main+0x0/0x3e0 [ptlrpc]
[12198.382692] [<ffffffff8100c14a>] child_rip+0xa/0x20
[12198.382958] [<ffffffffa12dbcb0>] ? ldlm_bl_thread_main+0x0/0x3e0 [ptlrpc]
[12198.383826] [<ffffffffa12dbcb0>] ? ldlm_bl_thread_main+0x0/0x3e0 [ptlrpc]
[12198.384126] [<ffffffff8100c140>] ? child_rip+0x0/0x20
[12198.384378]
[12198.455840] Kernel panic - not syncing: LBUG
[12198.456115] Pid: 26187, comm: ldlm_bl_21 Not tainted 2.6.32-debug #6
[12198.456423] Call Trace:
[12198.456632] [<ffffffff814f75e4>] ? panic+0xa0/0x168
[12198.456887] [<ffffffffa0caaf7b>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
[12198.457201] [<ffffffffa04cec1d>] ? osc_page_delete+0x29d/0x2a0 [osc]
[12198.457516] [<ffffffffa11d8e65>] ? cl_page_delete0+0xd5/0x500 [obdclass]
[12198.457816] [<ffffffffa11d92d2>] ? cl_page_delete+0x42/0x120 [obdclass]
[12198.458127] [<ffffffffa0ac37cd>] ? ll_invalidatepage+0x8d/0x180 [lustre]
[12198.458420] [<ffffffffa0ad592f>] ? vvp_page_discard+0x9f/0x140 [lustre]
[12198.458718] [<ffffffffa11d6358>] ? cl_page_invoid+0x78/0x170 [obdclass]
[12198.459014] [<ffffffffa11d68a8>] ? cl_page_invoke+0x158/0x1c0 [obdclass]
[12198.459328] [<ffffffffa11d6463>] ? cl_page_discard+0x13/0x20 [obdclass]
[12198.459627] [<ffffffffa11e08ae>] ? check_and_discard_cb+0x13e/0x160 [obdclass]
[12198.460126] [<ffffffffa11db54c>] ? cl_page_gang_lookup+0x1fc/0x410 [obdclass]
[12198.460617] [<ffffffffa11e0770>] ? check_and_discard_cb+0x0/0x160 [obdclass]
[12198.460970] [<ffffffffa11e0770>] ? check_and_discard_cb+0x0/0x160 [obdclass]
[12198.462162] [<ffffffffa11de03e>] ? cl_lock_discard_pages+0x11e/0x1f0 [obdclass]
[12198.462646] [<ffffffffa11d33d5>] ? cl_env_info+0x15/0x20 [obdclass]
[12198.462932] [<ffffffffa04cfb80>] ? osc_lock_flush+0x110/0x200 [osc]
[12198.463225] [<ffffffffa04cfcc9>] ? osc_lock_cancel+0x59/0x1a0 [osc]
[12198.463533] [<ffffffffa11dbda5>] ? cl_lock_cancel0+0x75/0x160 [obdclass]
[12198.463829] [<ffffffffa11dca0b>] ? cl_lock_cancel+0x13b/0x140 [obdclass]
[12198.464120] [<ffffffffa04d19ba>] ? osc_ldlm_blocking_ast+0x13a/0x380 [osc]
[12198.464436] [<ffffffffa12dbaf3>] ? ldlm_handle_bl_callback+0x123/0x2e0 [ptlrpc]
[12198.464898] [<ffffffffa12dbf31>] ? ldlm_bl_thread_main+0x281/0x3e0 [ptlrpc]
[12198.465240] [<ffffffff81057d60>] ? default_wake_function+0x0/0x20
[12198.465533] [<ffffffffa12dbcb0>] ? ldlm_bl_thread_main+0x0/0x3e0 [ptlrpc]
[12198.465815] [<ffffffff8100c14a>] ? child_rip+0xa/0x20
[12198.466084] [<ffffffffa12dbcb0>] ? ldlm_bl_thread_main+0x0/0x3e0 [ptlrpc]
[12198.466393] [<ffffffffa12dbcb0>] ? ldlm_bl_thread_main+0x0/0x3e0 [ptlrpc]
[12198.466699] [<ffffffff8100c140>] ? child_rip+0x0/0x20
I have vmcore for this occurence |
| Comment by Jinshan Xiong (Inactive) [ 20/Oct/12 ] |
|
duplicate of |
| Comment by Bruce Korb (Inactive) [ 14/Nov/12 ] |
|
I am a little unclear as to how this was diagnosed as a duplicate of So, I don't see the obvious reason the osc_extent_merge assertion would be |
| Comment by Jinshan Xiong (Inactive) [ 14/Nov/12 ] |
|
In Depending on the time when this lock is found by other process, difference symptoms will show up. For example, in osc_extent_merge(), I discovered that one of the extent's oe_lock pointed to that "glimpse write" lock, and another extent had the normal one. For the osc_page_delete() case, that was because one dirty page was added into the "glimpse write" lock, since there is no cancel for this lock, the page still remained in the cache after the lock was deleted. Afterwards, when a read lock was enqueued and canceled, it discovered this page and then hit the assert. |
| Comment by Bruce Korb (Inactive) [ 14/Nov/12 ] |
|
OK, well we (Cory at Cray) see the same symptom, even after the > Denis Kondratenko added a comment - 06/Nov/12 4:31 AM The SP2 version of the cdump was, unfortunately, corrupt, so I've been futzing I can attach the SP2 console log, if you like. But not the "lctl dk" dump, |
| Comment by Jinshan Xiong (Inactive) [ 14/Nov/12 ] |
|
this is a different issue from what have described by Oleg |
| Comment by Bruce Korb (Inactive) [ 14/Nov/12 ] |
|
OK. Thank you. It just looks a lot alike. 2012-11-06T14:57:10.872333-06:00 c0-0c1s6n0 LustreError: 5270:0:(osc_cache.c:2367:osc_teardown_async_page()) extent ffff88060eedfe58@ {[23 -> 23/255], [2|0|-|cache|wi|ffff88020e18f8c8], [4096|1|+|-|ffff8801f9ed9c18|256| (null)]} trunc at 23. |
| Comment by Bruce Korb (Inactive) [ 15/Nov/12 ] |
|
I don't see an obvious way to add attachments, so I'll x-clip and paste some of the log info.
The connection recovers, according to logs:
Then about 400,000 copies of this:
followed 0.2 seconds with the LBUG:
|
| Comment by Bruce Korb (Inactive) [ 30/Nov/12 ] |
|
Here is an xz compressed "dk" log from a crash dump that is an example of the issue. The two processes 617 and 619 are at issue. There is a timing hole of some sort. I will append Shadow's and my analysis shortly. |
| Comment by Bruce Korb (Inactive) [ 30/Nov/12 ] |
|
With a crash dump with debug tracing turned on, we get a lot more useful information. Alexey Lyashkov added a comment - 30/Nov/12 12:27 AM In fact that is bug in new osc extent code landed into 2.3 :/ both code path 1) ldlm lock cancel -> page discard .... and 2) mkwrite /race/ -> invalidate page forget to find a extent and set correct state for it. looks osc extent author forget about forced page flush from a page cache with mkwrite race and client eviction. for first case that is more or less clear - cl_lock_discard_pages foget to find an extent and set truncate state, because use a old API and directly kill a pages. that is may be fixed via replace cl_lock_discard_pages [with a] osc_extent_truncate_start or something similar. but for second case it's not clear for me. [It] looks [like] we need [to] completely change code near of mkwrite and page invalidation. [Otherwise] (may be dirty hack) - we change osc_page_delete to ignore error in osc_teardown_async_page for some 'force' cases or (better) correctly abort an IO request for that page if page discard need (page may be in readahead as was in 790798-osc_page_delete-assert0-sp1-1120-fulldbg.tar.bz2). |
| Comment by Bruce Korb (Inactive) [ 07/Dec/12 ] |
|
More comments on Xyratex bug. Given the difficulty/complexity of the change, I would happily accept guidance here. (i.e. wait for you-all or suggestions on how to proceed) Iurii Golovach added a comment - 06/Dec/12 4:34 AM During implementation of extents on OSC level, two cases fell between cracks:
Since adding these cases is quite complex task we recommend to do next: Please let us know if you need patch against 2.3 to reverse these patches. Cory Spitz added a comment - 06/Dec/12 2:42 PM Hi, that's a pretty big change. How many bug fixes would we loose? I'm not sure that the tradeoff for this bug is worth it (yet). Can you guys help w/ |
| Comment by Bruce Korb (Inactive) [ 19/Dec/12 ] |
|
In today's Xyratex/Cray conference call, it was decided to go forward with backing out the |
| Comment by Cory Spitz [ 19/Dec/12 ] |
|
I think that this should be a 2.4 release blocker. |
| Comment by Oleg Drokin [ 23/Dec/12 ] |
|
Hm, I think I just hit this issue on my burn-in cluster as well: [56349.656614] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) page@ffff8800805c6f08[3 ffff88000bf11f80:256 ^(null)_ffff880014be0f08 0 0 1 (null) (null) 0x0]
[56349.657354] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) page@ffff880014be0f08[2 ffff88002a8ecf98:256 ^ffff8800805c6f08_(null) 0 0 1 (null) (null) 0x0]
[56349.658145] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) vvp-page@ffff88006e65eb60(0:0:0) vm@ffffea00021b5a58 2000000000083c 2:0 ffff8800805c6f08 256 lru
[56349.658890] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) lov-page@ffff88002510ee00
[56349.659360] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) osc-page@ffff880062170f08: 1< 0x845fed 258 0 + - > 2< 1048576 0 4096 0x0 0x520 | (null) ffff88007d410cf0 ffff880078590eb0 > 3< + ffff88006fc50140 1 0 0 > 4< 0 0 8 20480 - | - - + - > 5< - - + - | 0 - | 251 - ->
[56349.660491] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) end page@ffff8800805c6f08
[56349.660996] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) exists 0/cp.
[56349.661284] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) page@ffff880014be1f08[3 ffff88000bf11f80:257 ^(null)_ffff88001333ef08 0 0 1 (null) (null) 0x0]
[56349.662042] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) page@ffff88001333ef08[2 ffff88002a8ecf98:257 ^ffff880014be1f08_(null) 0 0 1 (null) (null) 0x0]
[56349.662802] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) vvp-page@ffff88006e65ebc8(0:0:0) vm@ffffea0001cef2d0 2000000000083c 2:0 ffff880014be1f08 257 lru
[56349.663548] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) lov-page@ffff88002510e620
[56349.664016] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) osc-page@ffff88002b782f08: 1< 0x845fed 258 0 + - > 2< 1052672 0 4096 0x0 0x520 | (null) ffff88007d410cf0 ffff880078590eb0 > 3< + ffff88006fc50140 1 0 0 > 4< 0 0 8 20480 - | - - + - > 5< - - + - | 0 - | 251 - ->
[56349.665336] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) end page@ffff880014be1f08
[56349.665805] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) exists 0/cp.
[56349.666679] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) page@ffff880093a26f08[3 ffff88000bf11f80:258 ^(null)_ffff88003c596f08 0 0 1 (null) (null) 0x0]
[56349.667419] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) page@ffff88003c596f08[2 ffff88002a8ecf98:258 ^ffff880093a26f08_(null) 0 0 1 (null) (null) 0x0]
[56349.668166] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) vvp-page@ffff88006e65ec30(0:0:0) vm@ffffea0002285560 2000000000083c 2:0 ffff880093a26f08 258 lru
[56349.668917] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) lov-page@ffff88002510ece0
[56349.669389] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) osc-page@ffff88005bf32f08: 1< 0x845fed 258 0 + - > 2< 1056768 0 4096 0x0 0x520 | (null) ffff88007d410cf0 ffff880078590eb0 > 3< + ffff88006fc50140 1 0 0 > 4< 0 0 8 20480 - | - - + - > 5< - - + - | 0 - | 251 - ->
[56349.671402] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) end page@ffff880093a26f08
[56349.671883] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) exists 0/cp.
[56349.672186] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) page@ffff8800304a3f08[3 ffff88000bf11f80:259 ^(null)_ffff880032388f08 0 0 1 (null) (null) 0x0]
[56349.672953] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) page@ffff880032388f08[2 ffff88002a8ecf98:259 ^ffff8800304a3f08_(null) 0 0 1 (null) (null) 0x0]
[56349.673667] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) vvp-page@ffff88006e65ec98(0:0:0) vm@ffffea0001d40e80 2000000000083c 2:0 ffff8800304a3f08 259 lru
[56349.674403] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) lov-page@ffff8800740957d0
[56349.674898] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) osc-page@ffff8800922cff08: 1< 0x845fed 258 0 + - > 2< 1060864 0 4096 0x0 0x520 | (null) ffff88007d410cf0 ffff880078590eb0 > 3< + ffff88006fc50140 1 0 0 > 4< 0 0 8 20480 - | - - + - > 5< - - + - | 0 - | 251 - ->
[56349.676000] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) end page@ffff8800304a3f08
[56349.676500] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) exists 0/cp.
[56349.676872] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) page@ffff8800304a2f08[3 ffff88000bf11f80:260 ^(null)_ffff8800639baf08 0 0 1 (null) (null) 0x0]
[56349.677605] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) page@ffff8800639baf08[2 ffff88002a8ecf98:260 ^ffff8800304a2f08_(null) 0 0 1 (null) (null) 0x0]
[56349.678374] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) vvp-page@ffff88006e65ed00(0:0:0) vm@ffffea0002114048 2000000000083c 2:0 ffff8800304a2f08 260 lru
[56349.679110] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) lov-page@ffff880074095c98
[56349.679560] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) osc-page@ffff880081cf0f08: 1< 0x845fed 258 0 + - > 2< 1064960 0 4096 0x0 0x520 | (null) ffff88007d410cf0 ffff880078590eb0 > 3< + ffff88006fc50140 1 0 0 > 4< 0 0 8 20480 - | - - + - > 5< - - + - | 0 - | 251 - ->
[56349.680665] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) end page@ffff8800304a2f08
[56349.681134] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) exists 0/cp.
[56349.681428] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) page@ffff8800639bbf08[3 ffff88000bf11f80:261 ^(null)_ffff8800223a9f08 0 0 1 (null) (null) 0x0]
[56349.682160] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) page@ffff8800223a9f08[2 ffff88002a8ecf98:261 ^ffff8800639bbf08_(null) 0 0 1 (null) (null) 0x0]
[56349.682893] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) vvp-page@ffff88006e65ed68(0:0:0) vm@ffffea000218f5a0 2000000000083c 2:0 ffff8800639bbf08 261 lru
[56349.683625] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) lov-page@ffff880074095ce0
[56349.684092] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) osc-page@ffff88001309df08: 1< 0x845fed 258 0 + - > 2< 1069056 0 4096 0x0 0x520 | (null) ffff88007d410cf0 ffff880078590eb0 > 3< + ffff88006fc50140 1 0 0 > 4< 0 0 8 20480 - | - - + - > 5< - - + - | 0 - | 251 - ->
[56349.685177] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) end page@ffff8800639bbf08
[56349.685623] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) exists 0/cp.
[56349.685921] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) page@ffff8800223a8f08[3 ffff88000bf11f80:262 ^(null)_ffff8800248a3f08 0 0 1 (null) (null) 0x0]
[56349.686682] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) page@ffff8800248a3f08[2 ffff88002a8ecf98:262 ^ffff8800223a8f08_(null) 0 0 1 (null) (null) 0x0]
[56349.687427] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) vvp-page@ffff88002d0ee0d0(0:0:0) vm@ffffea0001f0b178 2000000000083c 2:0 ffff8800223a8f08 262 lru
[56349.688184] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) lov-page@ffff880074095ae8
[56349.688665] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) osc-page@ffff88008e200f08: 1< 0x845fed 258 0 + - > 2< 1073152 0 4096 0x0 0x520 | (null) ffff88007d410cf0 ffff880078590eb0 > 3< + ffff88006fc50140 1 0 0 > 4< 0 0 8 20480 - | - - + - > 5< - - + - | 0 - | 251 - ->
[56349.689748] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) end page@ffff8800223a8f08
[56349.690214] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) exists 0/cp.
[56349.690501] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) page@ffff880075c2ef08[3 ffff88000bf11f80:263 ^(null)_ffff880033ee2f08 0 0 1 (null) (null) 0x0]
[56349.691237] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) page@ffff880033ee2f08[2 ffff88002a8ecf98:263 ^ffff880075c2ef08_(null) 0 0 1 (null) (null) 0x0]
[56349.691967] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) vvp-page@ffff88002d0ee4e0(0:0:0) vm@ffffea00020fec90 2000000000083c 2:0 ffff880075c2ef08 263 lru
[56349.692706] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) lov-page@ffff880074095aa0
[56349.693172] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) osc-page@ffff880093ce0f08: 1< 0x845fed 258 0 + - > 2< 1077248 0 4096 0x0 0x520 | (null) ffff88007d410cf0 ffff880078590eb0 > 3< + ffff88006fc50140 1 0 0 > 4< 0 0 8 20480 - | - - + - > 5< - - + - | 0 - | 251 - ->
[56349.694248] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) end page@ffff880075c2ef08
[56349.694717] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) exists 0/cp.
[56349.695010] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) page@ffff880033ee3f08[3 ffff88000bf11f80:264 ^(null)_ffff880061765f08 0 0 1 (null) (null) 0x0]
[56349.696293] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) page@ffff880061765f08[2 ffff88002a8ecf98:264 ^ffff880033ee3f08_(null) 0 0 1 (null) (null) 0x0]
[56349.697945] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) vvp-page@ffff88002d0eea90(0:0:0) vm@ffffea0001f481f8 2000000000083c 2:0 ffff880033ee3f08 264 lru
[56349.698697] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) lov-page@ffff880074095740
[56349.699176] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) osc-page@ffff8800794ccf08: 1< 0x845fed 258 0 + - > 2< 1081344 0 4096 0x0 0x520 | (null) ffff88007d410cf0 ffff880078590eb0 > 3< + ffff88006fc50140 1 0 0 > 4< 0 0 8 20480 - | - - + - > 5< - - + - | 0 - | 251 - ->
[56349.700246] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) end page@ffff880033ee3f08
[56349.700732] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) exists 0/cp.
[56349.701026] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) page@ffff8800393f6f08[3 ffff88000bf11f80:265 ^(null)_ffff88008b6e9f08 0 0 1 (null) (null) 0x0]
[56349.701758] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) page@ffff88008b6e9f08[2 ffff88002a8ecf98:265 ^ffff8800393f6f08_(null) 0 0 1 (null) (null) 0x0]
[56349.702492] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) vvp-page@ffff88002d0eec30(0:0:0) vm@ffffea000277cc78 2000000000083c 2:0 ffff8800393f6f08 265 lru
[56349.703850] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) lov-page@ffff880074095110
[56349.704321] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) osc-page@ffff88006b098f08: 1< 0x845fed 258 0 + - > 2< 1085440 0 4096 0x0 0x520 | (null) ffff88007d410cf0 ffff880078590eb0 > 3< + ffff88006fc50140 1 0 0 > 4< 0 0 8 20480 - | - - + - > 5< - - + - | 0 - | 251 - ->
[56349.705396] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) end page@ffff8800393f6f08
[56349.706003] LustreError: 20323:0:(osc_io.c:364:trunc_check_cb()) exists 0/cp.
[56349.758028] LustreError: 20323:0:(osc_cache.c:2370:osc_teardown_async_page()) extent ffff88006e1f1f38@{[256 -> 506/511], [2|0|-|cache|wi|ffff880078590eb0], [1028096|251|+|-|ffff88002b60ced8|256|(null)]} trunc at 256.
[56349.759172] LustreError: 20323:0:(osc_page.c:431:osc_page_delete()) page@ffff8800805c6f08[2 ffff88000bf11f80:256 ^(null)_ffff880014be0f08 4 0 1 (null) (null) 0x0]
[56349.759981] LustreError: 20323:0:(osc_page.c:431:osc_page_delete()) page@ffff880014be0f08[2 ffff88002a8ecf98:256 ^ffff8800805c6f08_(null) 4 0 1 (null) (null) 0x0]
[56349.760931] LustreError: 20323:0:(osc_page.c:431:osc_page_delete()) vvp-page@ffff88006e65eb60(0:0:0) vm@ffffea00021b5a58 20000000000035 3:0 0 256 lru
[56349.761537] LustreError: 20323:0:(osc_page.c:431:osc_page_delete()) lov-page@ffff88002510ee00
[56349.762131] LustreError: 20323:0:(osc_page.c:431:osc_page_delete()) osc-page@ffff880062170f08: 1< 0x845fed 258 0 + - > 2< 1048576 0 4096 0x0 0x520 | (null) ffff88007d410cf0 ffff880078590eb0 > 3< + ffff88006fc50140 0 0 0 > 4< 0 0 8 20480 - | - - + - > 5< - - + - | 0 - | 251 - ->
[56349.763222] LustreError: 20323:0:(osc_page.c:431:osc_page_delete()) end page@ffff8800805c6f08
[56349.763780] LustreError: 20323:0:(osc_page.c:431:osc_page_delete()) Trying to teardown failed: -16
[56349.764587] LustreError: 20323:0:(osc_page.c:432:osc_page_delete()) ASSERTION( 0 ) failed:
[56349.765113] LustreError: 20323:0:(osc_page.c:432:osc_page_delete()) LBUG
[56349.765406] Pid: 20323, comm: cp
[56349.765648]
[56349.765648] Call Trace:
[56349.766258] [<ffffffffa0405915>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
[56349.766719] [<ffffffffa0405f27>] lbug_with_loc+0x47/0xb0 [libcfs]
[56349.767070] [<ffffffffa09bd5dd>] osc_page_delete+0x32d/0x330 [osc]
[56349.767381] [<ffffffffa05d6b15>] cl_page_delete0+0xc5/0x510 [obdclass]
[56349.767713] [<ffffffffa05d6fa2>] cl_page_delete+0x42/0x120 [obdclass]
[56349.768192] [<ffffffffa0e4025d>] ll_invalidatepage+0x8d/0x180 [lustre]
[56349.768623] [<ffffffff81129ee8>] do_invalidatepage+0x28/0x30
[56349.768950] [<ffffffff8112a102>] truncate_inode_page+0xa2/0xc0
[56349.769216] [<ffffffff8112a42a>] truncate_inode_pages_range+0x15a/0x460
[56349.769515] [<ffffffff8112a745>] truncate_inode_pages+0x15/0x20
[56349.769877] [<ffffffff8112a797>] truncate_pagecache+0x47/0x70
[56349.770297] [<ffffffff8112a7d9>] truncate_setsize+0x19/0x20
[56349.770657] [<ffffffff8112a81e>] vmtruncate+0x3e/0x70
[56349.771010] [<ffffffffa0e53a53>] vvp_io_setattr_end+0x63/0x80 [lustre]
[56349.771349] [<ffffffffa05dec60>] cl_io_end+0x60/0x150 [obdclass]
[56349.771769] [<ffffffffa05e3d82>] cl_io_loop+0xc2/0x1b0 [obdclass]
[56349.772084] [<ffffffffa0e4c448>] cl_setattr_ost+0x208/0x2d0 [lustre]
[56349.772389] [<ffffffffa0e1c122>] ll_setattr_raw+0x792/0xfb0 [lustre]
[56349.772706] [<ffffffffa0e1c99b>] ll_setattr+0x5b/0xf0 [lustre]
[56349.772987] [<ffffffff81197688>] notify_change+0x168/0x340
[56349.773252] [<ffffffff81179af4>] do_truncate+0x64/0xa0
[56349.773530] [<ffffffff8118c309>] do_filp_open+0x829/0xd90
[56349.773854] [<ffffffff8118827b>] ? getname+0x3b/0x250
[56349.774277] [<ffffffff81044f4e>] ? kernel_map_pages+0xfe/0x110
[56349.774708] [<ffffffff8119861b>] ? alloc_fd+0xab/0x160
[56349.775166] [<ffffffff81178839>] do_sys_open+0x69/0x140
[56349.775569] [<ffffffff81178950>] sys_open+0x20/0x30
[56349.775968] [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
[56349.776371]
[56349.777741] Kernel panic - not syncing: LBUG
vmcore and modules are in /exports/crashdumps/192.168.10.218-2012-12-23-08:15:29 |
| Comment by Jinshan Xiong (Inactive) [ 26/Dec/12 ] |
|
Hi Oleg, please file a new bug for this because it seems like a new issue. From the stacktrace and crashdump, an extent was not truncated by osc_cache_truncate_start() somehow. |
| Comment by Bruce Korb (Inactive) [ 02/Jan/13 ] |
|
Since this was filed against 2.1.2 and the problem we are trying to deal with only occurs on the 2.3 source base, we have opened a new bug report: |
| Comment by Andreas Dilger [ 29/May/17 ] |
|
Close old ticket. |