[LU-6666] osc_brw_prep_request()) ASSERTION( page_count == 1 || (ergo(i == 0, poff + pg->count == PAGE_CACHE_SIZE) - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Minor
Fix Version/s: Lustre 2.8.0
Affects Version/s: Lustre 2.8.0
Labels:
None

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

I met this during racer. This looks like ~~LU-6227~~, but not in directIO patch.

 
VFS: Error -28 occurred while creating quota.
VFS: Error -28 occurred while creating quota.
VFS: Error -28 occurred while creating quota.
VFS: Error -28 occurred while creating quota.
VFS: Error -28 occurred while creating quota.
VFS: Error -28 occurred while creating quota.
VFS: Error -28 occurred while creating quota.
VFS: Error -28 occurred while creating quota.
VFS: Error -28 occurred while creating quota.
VFS: Error -28 occurred while creating quota.
VFS: Error -28 occurred while creating quota.
LustreError: 19811:0:(osc_request.c:1101:osc_brw_prep_request()) ASSERTION( page_count == 1 || (ergo(i == 0, poff + pg->count == PAGE_CACHE_SIZE) && ergo(i > 0 && i < page_count - 1, poff == 0 && pg->count == PAGE_CACHE_SIZE) && ergo(i == page_count - 1, poff == 0)) ) failed: i: 6/27 pg: ffff8801efd6d640 off: 24576, count: 3272
LustreError: 19811:0:(osc_request.c:1101:osc_brw_prep_request()) LBUG
Pid: 19811, comm: ptlrpcd_4

Call Trace:
 [<ffffffffa116b875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
 [<ffffffffa116be77>] lbug_with_loc+0x47/0xb0 [libcfs]
 [<ffffffffa08a4b55>] osc_brw_prep_request+0xc35/0x10a0 [osc]
 [<ffffffffa08b4471>] ? osc_req_attr_set+0x1b1/0x740 [osc]
 [<ffffffffa08a5859>] osc_build_rpc+0x899/0x15c0 [osc]
 [<ffffffffa08c0eda>] osc_io_unplug0+0x115a/0x1b40 [osc]
 [<ffffffffa08b9a83>] ? osc_ap_completion+0x213/0x600 [osc]
 [<ffffffffa156d8bb>] ? lu_object_put+0x12b/0x310 [obdclass]
 [<ffffffffa08c3e61>] osc_io_unplug+0x11/0x20 [osc]
 [<ffffffffa08a746f>] brw_interpret+0x9bf/0x1fa0 [osc]
 [<ffffffffa060eadc>] ? ptlrpc_free_committed+0x56c/0x770 [ptlrpc]
 [<ffffffffa061bdb2>] ? ptlrpc_unregister_bulk+0xa2/0xac0 [ptlrpc]
 [<ffffffffa0610772>] ? after_reply+0xcb2/0xeb0 [ptlrpc]
 [<ffffffffa0614ab1>] ptlrpc_check_set+0x331/0x1c70 [ptlrpc]
 [<ffffffff81087fdb>] ? try_to_del_timer_sync+0x7b/0xe0
 [<ffffffffa0642393>] ptlrpcd_check+0x533/0x550 [ptlrpc]
 [<ffffffffa06429cb>] ptlrpcd+0x35b/0x430 [ptlrpc]
 [<ffffffff81064b90>] ? default_wake_function+0x0/0x20
 [<ffffffffa0642670>] ? ptlrpcd+0x0/0x430 [ptlrpc]
 [<ffffffff8109e66e>] kthread+0x9e/0xc0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109e5d0>] ? kthread+0x0/0xc0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

lustre-log.1432912775.19811.gz
1.01 MB
03/Jun/15 2:59 AM

Issue Links

is related to

LU-6776 osc_request.c:1670:osc_build_rpc()) ASSERTION( oap->oap_page_off + oap->oap_brw_page.count == ((1UL) << 12) ) failed

Resolved

is related to

LU-6227 Master testing: (osc_request.c:1219:osc_brw_prep_request()) ASSERTION( i == 0 || pg->off > pg_prev->off)

Resolved

Activity

[LU-6666] osc_brw_prep_request()) ASSERTION( page_count == 1 || (ergo(i == 0, poff + pg->count == PAGE_CACHE_SIZE)

Andreas Dilger added a comment - 17/Sep/15 6:08 PM

Patrick, sorry for long delay in replying. Having full-page holes in the middle of an RPC is fine, since a BRW RPC can have multiple niobufs in it. The problem is with partial page writes. That used to happen with liblustre (deprecated userspace library), but I'm not even sure it is possible with llite (kernel VFS client) since always full pages are written unless it is at EOF, or possibly with O_DIRECT?

The other question is whether this bug is still being hit with recent builds, or has it been fixed by something else and could be closed?

Andreas Dilger added a comment - 17/Sep/15 6:08 PM Patrick, sorry for long delay in replying. Having full-page holes in the middle of an RPC is fine, since a BRW RPC can have multiple niobufs in it. The problem is with partial page writes. That used to happen with liblustre (deprecated userspace library), but I'm not even sure it is possible with llite (kernel VFS client) since always full pages are written unless it is at EOF, or possibly with O_DIRECT? The other question is whether this bug is still being hit with recent builds, or has it been fixed by something else and could be closed?

Patrick Farrell (Inactive) added a comment - 15/Jul/15 7:03 PM

Andreas, Jinshan - Apologies if I'm misunderstanding, but more questions...

You're saying an RPC can't have gaps in the middle. I'm wondering about the meaning of gap. Does that mean it's an issue to have any gap in the data in a bulk RPC, like two non-contiguous pages? Or is the meaning of 'gap' limited to a fragmented page?

If the first meaning (including non-contiguous full pages) is correct, I don't see how Jinshan's patch keeps us safe by preventing merging partial pages. (As I don't see how the code enforces merging only contiguous extents, though the comment on get_write_extents implies that's the case.)

Patrick Farrell (Inactive) added a comment - 15/Jul/15 7:03 PM Andreas, Jinshan - Apologies if I'm misunderstanding, but more questions... You're saying an RPC can't have gaps in the middle. I'm wondering about the meaning of gap. Does that mean it's an issue to have any gap in the data in a bulk RPC, like two non-contiguous pages? Or is the meaning of 'gap' limited to a fragmented page? If the first meaning (including non-contiguous full pages) is correct, I don't see how Jinshan's patch keeps us safe by preventing merging partial pages. (As I don't see how the code enforces merging only contiguous extents, though the comment on get_write_extents implies that's the case.)

Andreas Dilger added a comment - 03/Jul/15 7:29 PM

Patrick, it might also happen if the user runs out of quota, I believe.

The reason it isn't ok to merge multiple fragments is because of LNet RDMA. The way the data is packed into the bulk RPC can't have gaps in the middle due to some RDMA implementations, only a fragment at the end (which is very common).

If the pages were cached on the client it would be possible to just expand the write to cover the whole page in the middle of the file, since the client needs to do a read-modify-write of the page. With O_DIRECT that isn't possible. Not knowledgable enough in CLIO to know whether the out-of-space handling could be changed to allow this or not.

Andreas Dilger added a comment - 03/Jul/15 7:29 PM Patrick, it might also happen if the user runs out of quota, I believe. The reason it isn't ok to merge multiple fragments is because of LNet RDMA. The way the data is packed into the bulk RPC can't have gaps in the middle due to some RDMA implementations, only a fragment at the end (which is very common). If the pages were cached on the client it would be possible to just expand the write to cover the whole page in the middle of the file, since the client needs to do a read-modify-write of the page. With O_DIRECT that isn't possible. Not knowledgable enough in CLIO to know whether the out-of-space handling could be changed to allow this or not.

Patrick Farrell (Inactive) added a comment - 03/Jul/15 7:21 PM

Jinshan - Are you saying the OST must be running out of space for this to happen? I'm almost certain in the instance Artem is talking about (the dump is from Cray), the OST is not low on space.

I'm puzzled by why the OST running out of space would impact client caching? Have I misunderstood?

One last thing - Why is it specifically unsafe to merge two extents with partial pages? (if I'm reading right, it is safe to merge one extent with a partial page with another that doesn't have a partial page.)

Patrick Farrell (Inactive) added a comment - 03/Jul/15 7:21 PM Jinshan - Are you saying the OST must be running out of space for this to happen? I'm almost certain in the instance Artem is talking about (the dump is from Cray), the OST is not low on space. I'm puzzled by why the OST running out of space would impact client caching? Have I misunderstood? One last thing - Why is it specifically unsafe to merge two extents with partial pages? (if I'm reading right, it is safe to merge one extent with a partial page with another that doesn't have a partial page.)

Jinshan Xiong (Inactive) added a comment - 03/Jul/15 7:12 AM

This can only happen when OST is running out of space, so the victim client can't cache page any more instead it uses sync I/O to write pages. For example, the first thread writes [0, 1024) and the 2nd thread writes [8192, 9216), these writes are not conflicted by range locks, and finally these two writes are picked by the same RPC and then hit this bug.

Jinshan Xiong (Inactive) added a comment - 03/Jul/15 7:12 AM This can only happen when OST is running out of space, so the victim client can't cache page any more instead it uses sync I/O to write pages. For example, the first thread writes [0, 1024) and the 2nd thread writes [8192, 9216), these writes are not conflicted by range locks, and finally these two writes are picked by the same RPC and then hit this bug.

Andreas Dilger added a comment - 03/Jul/15 7:01 AM - edited

If the pages are cached on the client (i.e. not O_DIRECT) then the full page must be cached since it isn't possible to mark only part of a page dirty, so I don't see how there can be multiple partial-page writes to the same file being merged?

Andreas Dilger added a comment - 03/Jul/15 7:01 AM - edited If the pages are cached on the client (i.e. not O_DIRECT) then the full page must be cached since it isn't possible to mark only part of a page dirty, so I don't see how there can be multiple partial-page writes to the same file being merged?

Gerrit Updater added a comment - 02/Jul/15 1:47 PM

Jinshan Xiong (jinshan.xiong@intel.com) uploaded a new patch: http://review.whamcloud.com/15468
Subject: ~~LU-6666~~ osc: Do not merge extents with partial pages
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: af031ebb20db32e0ab01a558b05442efcede5dbf

Gerrit Updater added a comment - 02/Jul/15 1:47 PM Jinshan Xiong (jinshan.xiong@intel.com) uploaded a new patch: http://review.whamcloud.com/15468 Subject: LU-6666 osc: Do not merge extents with partial pages Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: af031ebb20db32e0ab01a558b05442efcede5dbf

Jinshan Xiong (Inactive) added a comment - 01/Jul/15 3:44 PM

Hi Artem,

This issue is indeed introduced by ~~LU-1669~~. When OST is out of space, and two processes write disjoint file range that don't overlap by page basis, the two writes can be merged into the same synchronous write RPC, and then hit this issue.

This problem can be fixed by adding a mutex in vvp_io_commit_sync() to serialize sync threads.

Jinshan Xiong (Inactive) added a comment - 01/Jul/15 3:44 PM Hi Artem, This issue is indeed introduced by LU-1669 . When OST is out of space, and two processes write disjoint file range that don't overlap by page basis, the two writes can be merged into the same synchronous write RPC, and then hit this issue. This problem can be fixed by adding a mutex in vvp_io_commit_sync() to serialize sync threads.

Artem Blagodarenko (Inactive) added a comment - 30/Jun/15 3:13 PM - edited

>Artem, were you doing direct I/O?
Can't determine is this direct I/O from crash dump. I believe them have the same code path.

Artem Blagodarenko (Inactive) added a comment - 30/Jun/15 3:13 PM - edited >Artem, were you doing direct I/O? Can't determine is this direct I/O from crash dump. I believe them have the same code path.

Artem Blagodarenko (Inactive) added a comment - 30/Jun/15 3:07 PM - edited

>Now that you can see pages in RPCs in question, can you post segments of continuous pages?

Jinshan Xiong, I can't understed what exact information I need to post. Can you give some details.
Post both brw_pages now.

crash-7.0.1-29bit> search -K 0xffff88044b96cf78
ffff88044a218728: ffff88044b96cf78 
ffff88045392d858: ffff88044b96cf78 

crash-7.0.1-29bit> brw_page ffff88044c771378
struct brw_page {
  off = 7872511, 
  pg = 0xffffea000f8698d0, 
  count = 1, 
  flag = 8
}
crash-7.0.1-29bit> ffff88044b96cf78
crash-7.0.1-29bit: command not found: ffff88044b96cf78
crash-7.0.1-29bit> brw_page ffff88044b96cf78
struct brw_page {
  off = 7884799, 
  pg = 0xffffea000f865500, 
  count = 1, 
  flag = 8
}

Artem Blagodarenko (Inactive) added a comment - 30/Jun/15 3:07 PM - edited >Now that you can see pages in RPCs in question, can you post segments of continuous pages? Jinshan Xiong, I can't understed what exact information I need to post. Can you give some details. Post both brw_pages now. crash-7.0.1-29bit> search -K 0xffff88044b96cf78 ffff88044a218728: ffff88044b96cf78 ffff88045392d858: ffff88044b96cf78 crash-7.0.1-29bit> brw_page ffff88044c771378 struct brw_page { off = 7872511, pg = 0xffffea000f8698d0, count = 1, flag = 8 } crash-7.0.1-29bit> ffff88044b96cf78 crash-7.0.1-29bit: command not found: ffff88044b96cf78 crash-7.0.1-29bit> brw_page ffff88044b96cf78 struct brw_page { off = 7884799, pg = 0xffffea000f865500, count = 1, flag = 8 }

Jinshan Xiong (Inactive) added a comment - 29/Jun/15 2:29 PM

Artem, were you doing direct I/O? Otherwise this looks like a race between osc_refresh_count() and osc_page_touch_at(). Now that you can see pages in RPCs in question, can you post segments of continuous pages?

Jinshan Xiong (Inactive) added a comment - 29/Jun/15 2:29 PM Artem, were you doing direct I/O? Otherwise this looks like a race between osc_refresh_count() and osc_page_touch_at(). Now that you can see pages in RPCs in question, can you post segments of continuous pages?

People

Assignee:: Jinshan Xiong (Inactive)

Reporter:: Di Wang (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 13 Start watching this issue

Dates

Created:: 31/May/15 5:19 AM

Updated:: 04/Jun/19 5:45 PM

Resolved:: 30/Nov/15 5:38 PM