[LU-77] cl_page.c::cl_page_own0() assertion in echoclient Created: 09/Feb/11 Updated: 11/May/11 Resolved: 05/May/11 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.0 |
| Fix Version/s: | Lustre 2.1.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Oleg Drokin | Assignee: | Jinshan Xiong (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Bugzilla ID: | 24,361 |
| Rank (Obsolete): | 5052 |
| Description |
|
Oracle reports this assertion failure when running obdfilter survey: Lustre: DEBUG MARKER: == obdfilter-survey test 2a: Stripe F/S over the Network
Call Trace: Kernel panic - not syncing: LBUG Eric Mei comments that apparently obdecho threads incorrectly share pages they are not supposed to. |
| Comments |
| Comment by Jinshan Xiong (Inactive) [ 16/Feb/11 ] |
|
I'm quite sure that the obdfilter-survey was doing rewrite when it hit this assertion. The root cause of this problem is that the previous write is still not finished when rewriting to the same object comes. The reason why the previous write is not finished is due to busyness of osc(have 7 write rpcs in flight). But but the reason why osc is so much busy is unknown. Maybe we can fix this problem by introducing a page lock for echo_page. In this way, the upcoming write to the same page will be blocked. |
| Comment by Jian Yu [ 01/Apr/11 ] |
|
While running obdfilter-survey test 2a on the latest stable CentOS5/x86_64 master build (#139 for client, #178 for server), I also hit the same LBUG on the client node. Here is the syslog: Apr 1 05:47:45 client-4 kernel: Lustre: DEBUG MARKER: == obdfilter-survey test 2a: Stripe F/S over the Network ============================================= 05:47:45 (1301662065) Apr 1 05:47:45 client-4 xinetd[3129]: EXIT: shell status=0 pid=8025 duration=0(sec) Apr 1 05:47:45 client-4 kernel: Lustre: 8139:0:(sec.c:1474:sptlrpc_import_sec_adapt()) import lustre-OST0000_osc->host_0_UUID netid 50000: select flavor null Apr 1 05:47:45 client-4 kernel: Lustre: 8139:0:(sec.c:1474:sptlrpc_import_sec_adapt()) Skipped 5 previous similar messages Message from syslogd@ at Fri Apr 1 05:48:14 2011 ... client-4 kernel: LustreError: 8354:0:(cl_page.c:986:cl_page_own0()) ASSERTION(0) failed Message from syslogd@ at Fri Apr 1 05:48:14 2011 ... client-4 kernel: LustreError: 8354:0:(cl_page.c:986:cl_page_own0()) LBUG Apr 1 05:48:13 client-4 kernel: LustreError: 8354:0:(cl_page.c:986:cl_page_own0()) page@ffff8103084c97b8[2 ffff810318366bc8:119002 ^0000000000000000_0000000000000000 1 0 2 ffff81030bf9cb38 0000000000000000 0x0] Apr 1 05:48:13 client-4 kernel: LustreError: 8354:0:(cl_page.c:986:cl_page_own0()) echo_client-page@ffff8103080a3198 vm@ffff81010aa26e18 Apr 1 05:48:14 client-4 kernel: format at cl_page.c:986:cl_page_own0 doesn't end in newline Apr 1 05:48:14 client-4 kernel: LustreError: 8354:0:(cl_page.c:986:cl_page_own0()) osc-page@ffff810308783280: 1< 0x845fed 258 0 - - - > 2< 487432192 0 0x0 0x308 | 0000000000000000 ffff81030fe485e8 ffff81030bf89c00 ffffffff88a37be0 ffff810308783280 > 3< - ffff810331d85820 0 0 1 > 4< 0 7 8 28311552 - | - - - - > 5< - - - - | 0 - - | 0 - -<3>LustreError: 8354:0:(cl_page.c:986:cl_page_own0()) end page@ffff8103084c97b8 Apr 1 05:48:14 client-4 kernel: LustreError: 8354:0:(cl_page.c:986:cl_page_own0()) pg->cp_owner == NULL Apr 1 05:48:14 client-4 kernel: LustreError: 8354:0:(cl_page.c:986:cl_page_own0()) ASSERTION(0) failed Apr 1 05:48:14 client-4 kernel: LustreError: 8354:0:(cl_page.c:986:cl_page_own0()) LBUG Apr 1 05:48:14 client-4 kernel: Pid: 8354, comm: lctl |
| Comment by Jinshan Xiong (Inactive) [ 26/Apr/11 ] |
|
I've verified the patch at: http://review.whamcloud.com/#change,462. It can fix the problem. W/o this patch, obdfilter-survey.sh hits this problem often, after applying this patch, this problem has never been hit again. |
| Comment by Jinshan Xiong (Inactive) [ 04/May/11 ] |
|
The AutoTest passed at: https://maloo.whamcloud.com/test_sessions/c8fd045c-75f4-11e0-a1b3-52540025f9af There is a failure on ost-pools:test-18. I think this is a known issue, and I have filed a ticket at Please check it. |
| Comment by Build Master (Inactive) [ 05/May/11 ] |
|
Integrated in Oleg Drokin : 8861ce8829752d29ef6afd49b5e046f306d93b5e
|
| Comment by Build Master (Inactive) [ 05/May/11 ] |
|
Integrated in Oleg Drokin : 8861ce8829752d29ef6afd49b5e046f306d93b5e
|
| Comment by Build Master (Inactive) [ 05/May/11 ] |
|
Integrated in Oleg Drokin : 8861ce8829752d29ef6afd49b5e046f306d93b5e
|
| Comment by Build Master (Inactive) [ 05/May/11 ] |
|
Integrated in Oleg Drokin : 8861ce8829752d29ef6afd49b5e046f306d93b5e
|
| Comment by Build Master (Inactive) [ 05/May/11 ] |
|
Integrated in Oleg Drokin : 8861ce8829752d29ef6afd49b5e046f306d93b5e
|
| Comment by Build Master (Inactive) [ 05/May/11 ] |
|
Integrated in Oleg Drokin : 8861ce8829752d29ef6afd49b5e046f306d93b5e
|
| Comment by Build Master (Inactive) [ 05/May/11 ] |
|
Integrated in Oleg Drokin : 8861ce8829752d29ef6afd49b5e046f306d93b5e
|
| Comment by Build Master (Inactive) [ 05/May/11 ] |
|
Integrated in Oleg Drokin : 8861ce8829752d29ef6afd49b5e046f306d93b5e
|
| Comment by Build Master (Inactive) [ 05/May/11 ] |
|
Integrated in Oleg Drokin : 8861ce8829752d29ef6afd49b5e046f306d93b5e
|
| Comment by Build Master (Inactive) [ 05/May/11 ] |
|
Integrated in Oleg Drokin : 8861ce8829752d29ef6afd49b5e046f306d93b5e
|
| Comment by Build Master (Inactive) [ 05/May/11 ] |
|
Integrated in Oleg Drokin : 8861ce8829752d29ef6afd49b5e046f306d93b5e
|
| Comment by Build Master (Inactive) [ 05/May/11 ] |
|
Integrated in Oleg Drokin : 8861ce8829752d29ef6afd49b5e046f306d93b5e
|
| Comment by Build Master (Inactive) [ 05/May/11 ] |
|
Integrated in Oleg Drokin : 8861ce8829752d29ef6afd49b5e046f306d93b5e
|
| Comment by Build Master (Inactive) [ 05/May/11 ] |
|
Integrated in Oleg Drokin : 8861ce8829752d29ef6afd49b5e046f306d93b5e
|
| Comment by Peter Jones [ 05/May/11 ] |
|
Landed for 2.1. Please reopen if this issue reoccurs with this fix in place |
| Comment by Sarah Liu [ 11/May/11 ] |
|
Running 8 tests including obdfilter-survey, pass https://maloo.whamcloud.com/test_sets/0e790810-7c34-11e0-b5bf-52540025f9af |