[LU-6666] osc_brw_prep_request()) ASSERTION( page_count == 1 || (ergo(i == 0, poff + pg->count == PAGE_CACHE_SIZE) Created: 31/May/15  Updated: 04/Jun/19  Resolved: 30/Nov/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Lustre 2.8.0

Type: Bug Priority: Minor
Reporter: Di Wang Assignee: Jinshan Xiong (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Attachments: File lustre-log.1432912775.19811.gz    
Issue Links:
Duplicate
Related
is related to LU-6227 Master testing: (osc_request.c:1219:o... Resolved
is related to LU-6776 osc_request.c:1670:osc_build_rpc()) A... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

I met this during racer. This looks like LU-6227, but not in directIO patch.

 
VFS: Error -28 occurred while creating quota.
VFS: Error -28 occurred while creating quota.
VFS: Error -28 occurred while creating quota.
VFS: Error -28 occurred while creating quota.
VFS: Error -28 occurred while creating quota.
VFS: Error -28 occurred while creating quota.
VFS: Error -28 occurred while creating quota.
VFS: Error -28 occurred while creating quota.
VFS: Error -28 occurred while creating quota.
VFS: Error -28 occurred while creating quota.
VFS: Error -28 occurred while creating quota.
LustreError: 19811:0:(osc_request.c:1101:osc_brw_prep_request()) ASSERTION( page_count == 1 || (ergo(i == 0, poff + pg->count == PAGE_CACHE_SIZE) && ergo(i > 0 && i < page_count - 1, poff == 0 && pg->count == PAGE_CACHE_SIZE) && ergo(i == page_count - 1, poff == 0)) ) failed: i: 6/27 pg: ffff8801efd6d640 off: 24576, count: 3272
LustreError: 19811:0:(osc_request.c:1101:osc_brw_prep_request()) LBUG
Pid: 19811, comm: ptlrpcd_4

Call Trace:
 [<ffffffffa116b875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
 [<ffffffffa116be77>] lbug_with_loc+0x47/0xb0 [libcfs]
 [<ffffffffa08a4b55>] osc_brw_prep_request+0xc35/0x10a0 [osc]
 [<ffffffffa08b4471>] ? osc_req_attr_set+0x1b1/0x740 [osc]
 [<ffffffffa08a5859>] osc_build_rpc+0x899/0x15c0 [osc]
 [<ffffffffa08c0eda>] osc_io_unplug0+0x115a/0x1b40 [osc]
 [<ffffffffa08b9a83>] ? osc_ap_completion+0x213/0x600 [osc]
 [<ffffffffa156d8bb>] ? lu_object_put+0x12b/0x310 [obdclass]
 [<ffffffffa08c3e61>] osc_io_unplug+0x11/0x20 [osc]
 [<ffffffffa08a746f>] brw_interpret+0x9bf/0x1fa0 [osc]
 [<ffffffffa060eadc>] ? ptlrpc_free_committed+0x56c/0x770 [ptlrpc]
 [<ffffffffa061bdb2>] ? ptlrpc_unregister_bulk+0xa2/0xac0 [ptlrpc]
 [<ffffffffa0610772>] ? after_reply+0xcb2/0xeb0 [ptlrpc]
 [<ffffffffa0614ab1>] ptlrpc_check_set+0x331/0x1c70 [ptlrpc]
 [<ffffffff81087fdb>] ? try_to_del_timer_sync+0x7b/0xe0
 [<ffffffffa0642393>] ptlrpcd_check+0x533/0x550 [ptlrpc]
 [<ffffffffa06429cb>] ptlrpcd+0x35b/0x430 [ptlrpc]
 [<ffffffff81064b90>] ? default_wake_function+0x0/0x20
 [<ffffffffa0642670>] ? ptlrpcd+0x0/0x430 [ptlrpc]
 [<ffffffff8109e66e>] kthread+0x9e/0xc0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109e5d0>] ? kthread+0x0/0xc0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20


 Comments   
Comment by Di Wang [ 31/May/15 ]

Btw: the build already includes the fix in LU-6227.

Comment by Zhenyu Xu [ 02/Jun/15 ]

in osc_build_rpc(), and in "/* add pages into rpc_list to build BRW rpc */" section, it makes sure that the middle pages has oap_page_off ==0 && oap_count == PAGE_CACHE_SIZE, and the error log shows that the 7th page has count == 3272 (its offset is PAGE aligned), could it be the page cache has been truncated while the I/O is undergoing?

Comment by Zhenyu Xu [ 03/Jun/15 ]

Di, is it reproducible? Can I get the core dump and logs?

Comment by Di Wang [ 03/Jun/15 ]

here you are

Comment by Zhenyu Xu [ 03/Jun/15 ]

also need core dump

Comment by Di Wang [ 03/Jun/15 ]

Sorry, I do not have that.

Comment by Zhenyu Xu [ 03/Jun/15 ]

Can you provide how to reproduce the issue?

Comment by Di Wang [ 08/Jun/15 ]

Hi, Bobi

I tried a few times on my branch. Sorry I can not reproduce it. Maybe it is just temporary.

Comment by Zhenyu Xu [ 08/Jun/15 ]

close it for now until we catch it again.

Comment by Artem Blagodarenko (Inactive) [ 29/Jun/15 ]

Happened again with some another symptoms (LBUGs)

LustreError: 4713:0:(osc_request.c:1812:osc_build_rpc()) ASSERTION( oap->oap_page_off == 0 ) failed:
LustreError: 4718:0:(osc_request.c:1818:osc_build_rpc()) ASSERTION( oap->oap_page_off + oap->oap_brw_page.count == ((1UL) << 12) ) failed:
LustreError: 4718:0:(osc_request.c:1818:osc_build_rpc()) LBUG
c0-0c0s10n1:
LustreError: 4661:0:(osc_request.c:1212:osc_brw_prep_request()) ASSERTION( page_count == 1 || (ergo(i == 0, poff + pg->count == PAGE_CACHE_SIZE) && ergo(i > 0 && i < page_count - 1, poff == 0 && pg->count == PAGE_CACHE_SIZE) && ergo(i == page_count - 1, poff == 0)) ) failed: i: 1/2 pg: ffff88044b96cf78 off: 7884799, count: 1
LustreError: 4661:0:(osc_request.c:1212:osc_brw_prep_request()) LBUG
Comment by Artem Blagodarenko (Inactive) [ 29/Jun/15 ]

Some additional information. Thread A
There are two objects

struct osc_async_page {
 ...
  oap_obj_off = 0x23000, 
  oap_page_off = 0x873, 
...  
  count = 0x41d, 
...
}

struct osc_async_page {
  oap_obj_off = 0x26000, 
  oap_page_off = 0x5eb, 
...
    count = 0xa15, 
}

so LustreError: 4713:0:(osc_request.c:1812:osc_build_rpc()) ASSERTION( oap->oap_page_off == 0 ) failed: happened
thread B

oap_obj_off = 0x1f000, 
  oap_page_off = 0x351, 
...
    count = 0x342, 
...

and

oap_obj_off = 0x4e000, 
  oap_page_off = 0x542, 
  count = 0x1c4, 

so,

LustreError: 4718:0:(osc_request.c:1818:osc_build_rpc()) ASSERTION( oap->oap_page_off + oap->oap_brw_page.count == ((1UL) << 12) ) failed

Comment by Artem Blagodarenko (Inactive) [ 29/Jun/15 ]

Is it possible that pages from output above can be in one rpc after landing patches from https://jira.hpdd.intel.com/browse/LU-1669?

Range lock will be worked out to allow write to a shared file in parallel, while Posix compliance is still guaranteed.
Comment by Jinshan Xiong (Inactive) [ 29/Jun/15 ]

Artem, were you doing direct I/O? Otherwise this looks like a race between osc_refresh_count() and osc_page_touch_at(). Now that you can see pages in RPCs in question, can you post segments of continuous pages?

Comment by Artem Blagodarenko (Inactive) [ 30/Jun/15 ]

>Now that you can see pages in RPCs in question, can you post segments of continuous pages?

Jinshan Xiong, I can't understed what exact information I need to post. Can you give some details.
Post both brw_pages now.

crash-7.0.1-29bit> search -K 0xffff88044b96cf78
ffff88044a218728: ffff88044b96cf78 
ffff88045392d858: ffff88044b96cf78 

crash-7.0.1-29bit> brw_page ffff88044c771378
struct brw_page {
  off = 7872511, 
  pg = 0xffffea000f8698d0, 
  count = 1, 
  flag = 8
}
crash-7.0.1-29bit> ffff88044b96cf78
crash-7.0.1-29bit: command not found: ffff88044b96cf78
crash-7.0.1-29bit> brw_page ffff88044b96cf78
struct brw_page {
  off = 7884799, 
  pg = 0xffffea000f865500, 
  count = 1, 
  flag = 8
}
Comment by Artem Blagodarenko (Inactive) [ 30/Jun/15 ]

>Artem, were you doing direct I/O?
Can't determine is this direct I/O from crash dump. I believe them have the same code path.

Comment by Jinshan Xiong (Inactive) [ 01/Jul/15 ]

Hi Artem,

This issue is indeed introduced by LU-1669. When OST is out of space, and two processes write disjoint file range that don't overlap by page basis, the two writes can be merged into the same synchronous write RPC, and then hit this issue.

This problem can be fixed by adding a mutex in vvp_io_commit_sync() to serialize sync threads.

Comment by Gerrit Updater [ 02/Jul/15 ]

Jinshan Xiong (jinshan.xiong@intel.com) uploaded a new patch: http://review.whamcloud.com/15468
Subject: LU-6666 osc: Do not merge extents with partial pages
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: af031ebb20db32e0ab01a558b05442efcede5dbf

Comment by Andreas Dilger [ 03/Jul/15 ]

If the pages are cached on the client (i.e. not O_DIRECT) then the full page must be cached since it isn't possible to mark only part of a page dirty, so I don't see how there can be multiple partial-page writes to the same file being merged?

Comment by Jinshan Xiong (Inactive) [ 03/Jul/15 ]

This can only happen when OST is running out of space, so the victim client can't cache page any more instead it uses sync I/O to write pages. For example, the first thread writes [0, 1024) and the 2nd thread writes [8192, 9216), these writes are not conflicted by range locks, and finally these two writes are picked by the same RPC and then hit this bug.

Comment by Patrick Farrell (Inactive) [ 03/Jul/15 ]

Jinshan - Are you saying the OST must be running out of space for this to happen? I'm almost certain in the instance Artem is talking about (the dump is from Cray), the OST is not low on space.

I'm puzzled by why the OST running out of space would impact client caching? Have I misunderstood?

One last thing - Why is it specifically unsafe to merge two extents with partial pages? (if I'm reading right, it is safe to merge one extent with a partial page with another that doesn't have a partial page.)

Comment by Andreas Dilger [ 03/Jul/15 ]

Patrick, it might also happen if the user runs out of quota, I believe.

The reason it isn't ok to merge multiple fragments is because of LNet RDMA. The way the data is packed into the bulk RPC can't have gaps in the middle due to some RDMA implementations, only a fragment at the end (which is very common).

If the pages were cached on the client it would be possible to just expand the write to cover the whole page in the middle of the file, since the client needs to do a read-modify-write of the page. With O_DIRECT that isn't possible. Not knowledgable enough in CLIO to know whether the out-of-space handling could be changed to allow this or not.

Comment by Patrick Farrell (Inactive) [ 15/Jul/15 ]

Andreas, Jinshan - Apologies if I'm misunderstanding, but more questions...

You're saying an RPC can't have gaps in the middle. I'm wondering about the meaning of gap. Does that mean it's an issue to have any gap in the data in a bulk RPC, like two non-contiguous pages? Or is the meaning of 'gap' limited to a fragmented page?

If the first meaning (including non-contiguous full pages) is correct, I don't see how Jinshan's patch keeps us safe by preventing merging partial pages. (As I don't see how the code enforces merging only contiguous extents, though the comment on get_write_extents implies that's the case.)

Comment by Andreas Dilger [ 17/Sep/15 ]

Patrick, sorry for long delay in replying. Having full-page holes in the middle of an RPC is fine, since a BRW RPC can have multiple niobufs in it. The problem is with partial page writes. That used to happen with liblustre (deprecated userspace library), but I'm not even sure it is possible with llite (kernel VFS client) since always full pages are written unless it is at EOF, or possibly with O_DIRECT?

The other question is whether this bug is still being hit with recent builds, or has it been fixed by something else and could be closed?

Comment by Andreas Dilger [ 02/Oct/15 ]

Patrick, Artem, are you able to test the patch http://review.whamcloud.com/15468 to see if this resolves your problem?

Comment by Patrick Farrell (Inactive) [ 16/Oct/15 ]

I was just able to test this - It does resolve the assertions in our 4 MiB RPC testing. I don't have a specific, well defined test case, however. And I'm still fairly sure we're not out of space or out of quota when hitting this.

We're hitting this when running a few different tests. One of them is growfiles from ltp.

This stupid little shell script hit for us eventually, but only on nodes with a number of real CPUs - no luck in VMs. (The numbers before the tests are internal Cray test numbers):

#29
ARG1='-W gf29 -b -D 0 -r 1-4096 -R 0-33554432 -i 0 -L 60 -C 1 -u gfsparse-3-$$'
#117
ARG2='-W gf117 -b -e 1 -i 0 -L 120 -u -g 5000 -T 100 -t 499990 -C 10 -c 1000 -S 10 -f Lgf03'
#309
ARG3=' -W gf309 -b -e 1 -u -r 1-5000 -R 0--1 -i 0 -L 30 -C 1 -I p g_rand12 g_rand12.2'
#701
ARG4='-W gf701 -b -e 1 -u -i 0 -L 20 -w -C 1 -I r -T 10 glseek20 glseek20.2'
#811
ARG5='-W gf811 -b -e 1 -u -r 1-5000 -i 0 -L 30 -C 1 -I L g_lio15 g_lio15.2'
#814
ARG6='-W gf814 -b -e 1 -u -i 0 -L 20 -w -C 1 -T 10 glseek19 glseek19.2'
#815
ARG7='-W gf815 -b -e 1 -u -r 1-49600 -I r -u -i 0 -L 120 Lgfile1'

(./growfiles $ARG1)&
(./growfiles $ARG2)&
(./growfiles $ARG3)&
(./growfiles $ARG4)&
(./growfiles $ARG5)&
(./growfiles $ARG6)&
(./growfiles $ARG7)&
Comment by Gerrit Updater [ 30/Nov/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/15468/
Subject: LU-6666 osc: Do not merge extents with partial pages
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: ac1d6ab73c733caf2fa0ccf504955b23d3e572f0

Comment by Joseph Gmitter (Inactive) [ 30/Nov/15 ]

Landed for 2.8

Generated at Sat Feb 10 02:02:11 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.