[LU-5471] Assertion at cl_page_assume for HSM Created: 11/Aug/14  Updated: 30/Jan/15  Resolved: 11/Aug/14

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Jinshan Xiong (Inactive) Assignee: Jinshan Xiong (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 15242

 Description   

The stack trace is as follows:

LustreError: 9876:0:(cl_page.c:698:cl_page_assume()) page@ffff88062850a7e0[2 ffff880bf2500e18 0 1834972009 1 ffff880633900065 0000000000000003 0x0]
LustreError: 9876:0:(cl_page.c:698:cl_page_assume()) vvp-page@ffff88062850a880(0:0:0) vm@ffffea00154cfbd8 40000000000801 4:0 ffff88062850a7e0 0 lru
LustreError: 9876:0:(cl_page.c:698:cl_page_assume()) lov-page@ffff88062850a8d8, raid0
LustreError: 9876:0:(cl_page.c:698:cl_page_assume()) osc-page@ffff88062851a700 0: 1< 0x845fed 0 1 - - > 2< 0 0 0 0x10000 0x100 | 0001000000000000 ffff880c090f3620 ffff88063267cea8 > 3< - (null) 0 18446462603027842665 0 > 4< 0 0 8 10481664 - | - - - - > 5< - - - - | 0 - | 0 - ->
LustreError: 9876:0:(cl_page.c:698:cl_page_assume()) end page@ffff88062850a7e0
LustreError: 9876:0:(cl_page.c:698:cl_page_assume()) pg->cp_owner == NULL
LustreError: 9876:0:(cl_page.c:698:cl_page_assume()) ASSERTION( 0 ) failed:
Aug 5 13:23:35 LustreError: 9876:0:(cl_page.c:698:cl_page_assume()) LBUG
Pid: 9876, comm: cat
Call Trace:
[<ffffffffa0839895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
n003 kernel: Lus [<ffffffffa0839e97>] lbug_with_loc+0x47/0xb0 [libcfs]
treError: 9876:0 [<ffffffffa0999002>] cl_page_assume+0x212/0x220 [obdclass]
[<ffffffffa0f03f73>] ll_readpage+0x83/0x1a0 [lustre]
:(cl_page.c:698: [<ffffffff81120f6c>] generic_file_aio_read+0x1fc/0x700
[<ffffffffa0f2a48f>] ? cl_glimpse_lock+0x1df/0x490 [lustre]
cl_page_assume() [<ffffffffa0f34f99>] vvp_io_read_start+0x259/0x470 [lustre]
) page@ffff88062 [<ffffffffa09a199a>] cl_io_start+0x6a/0x140 [obdclass]
850a7e0[2 ffff88 [<ffffffffa09a5b24>] cl_io_loop+0xb4/0x1b0 [obdclass]
0bf2500e18 0 183 [<ffffffffa0ed6f06>] ll_file_io_generic+0x2b6/0x710 [lustre]
4972009 1 ffff88 [<ffffffffa0ed7faf>] ll_file_aio_read+0x13f/0x2c0 [lustre]
[<ffffffffa0ed829c>] ll_file_read+0x16c/0x2a0 [lustre]
0633900065 00000 [<ffffffff81189365>] vfs_read+0xb5/0x1a0
[<ffffffff811894a1>] sys_read+0x51/0x90
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

The root cause of this issue is that we need to adjust page size after a file loses its layout; otherwise, the page size will keep increasing and eventually overflow an u16 integer that is used to remember the current page size.

This issue can be easily reproduced by releasing and restoring an HSM in a loop.



 Comments   
Comment by Andrew Moe [ 29/Jan/15 ]

Jinshan, is there a patch associated with this fix? I have encountered this bug as well. Should I expect this to be fixed in a particular Lustre release?

Comment by Frank Zago (Inactive) [ 30/Jan/15 ]

Andy, it's probably the same bug as LU-5459.

Generated at Sat Feb 10 01:51:46 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.