[LU-4852] osc_extent_truncate()) ASSERTION( !ext->oe_urgent ) failed Created: 02/Apr/14  Updated: 11/Jun/14  Resolved: 11/Jun/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.5.2

Type: Bug Priority: Critical
Reporter: Andriy Skulysh Assignee: Lai Siyao
Resolution: Fixed Votes: 0
Labels: patch

Severity: 3
Rank (Obsolete): 13384

 Description   

> 19:39:34 LustreError: 18612:0:(osc_cache.c:933:osc_extent_truncate()) ASSERTION( !ext->oe_urgent ) failed:
> 19:39:34 LustreError: 18612:0:(osc_cache.c:933:osc_extent_truncate()) LBUG
> 19:39:34 Pid: 18612, comm: iozone
> 19:39:34 Call Trace:
> 19:39:34 [<ffffffff81005eb9>] try_stack_unwind+0x169/0x1b0
> 19:39:34 [<ffffffff81004919>] dump_trace+0x89/0x450
> 19:39:34 [<ffffffffa02388d7>] libcfs_debug_dumpstack+0x57/0x80 [libcfs]
> 19:39:34 [<ffffffffa0238e37>] lbug_with_loc+0x47/0xc0 [libcfs]
> 19:39:34 [<ffffffffa071f7ed>] osc_cache_truncate_start+0x204d/0x2320 [osc]
> 19:39:34 [<ffffffffa070e972>] osc_io_setattr_start+0x212/0x460 [osc]
> 19:39:34 [<ffffffffa038dea2>] cl_io_start+0x72/0x140 [obdclass]
> 19:39:34 [<ffffffffa079f047>] lov_io_call+0x87/0x130 [lov]
> 19:39:34 [<ffffffffa07a26e5>] lov_io_start+0x115/0x180 [lov]
> 19:39:34 [<ffffffffa038dea2>] cl_io_start+0x72/0x140 [obdclass]
> 19:39:34 [<ffffffffa0392134>] cl_io_loop+0xb4/0x1b0 [obdclass]
> 19:39:34 [<ffffffffa086f8f0>] cl_setattr_ost+0x260/0x300 [lustre]
> 19:39:34 [<ffffffffa0843aaa>] ll_setattr_raw+0xa4a/0x10a0 [lustre]
> 19:39:34 [<ffffffffa0844157>] ll_setattr+0x57/0x100 [lustre]
> 19:39:34 [<ffffffff8116faf0>] notify_change+0x120/0x310
> 19:39:34 [<ffffffff81154451>] do_truncate+0x61/0x90
> 19:39:34 [<ffffffff81162ec1>] do_last+0x5d1/0x7d0
> 19:39:34 [<ffffffff81163c08>] path_openat+0xc8/0x3d0
> 19:39:34 [<ffffffff81164038>] do_filp_open+0x48/0xa0
> 19:39:34 [<ffffffff81154c4e>] do_sys_open+0x16e/0x240
> 19:39:34 [<ffffffff81154d60>] sys_open+0x20/0x30
> 19:39:34 [<ffffffff81154d85>] sys_creat+0x15/0x20
> 19:39:34 [<ffffffff8145512b>] system_call_fastpath+0x16/0x1b
> 19:39:34 [<00000000201d2e60>] 0x201d2e60
> 19:39:34 Kernel panic - not syncing: LBUG



 Comments   
Comment by Andriy Skulysh [ 02/Apr/14 ]

The bug was caused by race between truncate & fsync. The problem is in osc_extent_wait().
There are 2 flags used for changing state : oe_trunc_pending and oe_urgent.
osc_cache_truncate_start() waits for all oe_urgent extents written.
But osc_extent_wait() doesn't takes into account oe_trunc_pending when sets oe_urgent.
The race arises after osc_object_unlock().

Comment by Andriy Skulysh [ 02/Apr/14 ]

patch: http://review.whamcloud.com/9866

Comment by Jodi Levi (Inactive) [ 02/Apr/14 ]

Lai,
Could you please keep an eye on this patch?
Thank you!

Comment by Jinshan Xiong (Inactive) [ 25/Apr/14 ]

Hi Andriy,

Did you see this issue in b2_4 or master?

Comment by Patrick Farrell (Inactive) [ 25/Apr/14 ]

Jinshan - We saw this in 2.5, I believe. Possibly 2.4 as well. Not master, but we don't do much testing with master (yet).

Comment by Andriy Skulysh [ 04/May/14 ]

patch for b2_5: http://review.whamcloud.com/#/c/10204/

Comment by Peter Jones [ 11/Jun/14 ]

Fixed in master by other means so only needed in 2.5.2

Generated at Sat Feb 10 01:46:25 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.