[LU-3192] LBUG:(osc_request.c:1308:osc_brw_prep_request()) ASSERTION( i == 0 || pg->off > pg_prev->off ) Created: 18/Apr/13 Updated: 07/Nov/14 Resolved: 07/Nov/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.7.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Vitaly Fertman | Assignee: | WC Triage |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | patch | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 7788 | ||||||||
| Description |
|
This happens to Lustre 2.3 running with SP1 but we have seen it in SP2 or 2.2(or even 2.1 with different stack trace, see LELUS-41): LustreError: 3821:0:(osc_request.c:819:osc_announce_cached()) dirty 361 - 362 > system dirty_max 2113536 Here are also some other instances all seem to hit the same page for the RPC: Lustre 2.3.0-trunk-1.0000.40706.58.4-abuild-trunk 27 First hit Cname # hits Apid/Roles 854758 doio_mpi LUS: LBUG-ASSERTION( i == 0 || pg->off > pg_prev->off ) failed: i 1 p_c 256 pg ffffea00179fdfa0 [pri 18446612160482967680 ind 46909] off 192139264 prev_pg ffffea000bd34598 [pri 0 ind 263425] off 192139264 855115 doio_mpi LUS: LBUG-ASSERTION( i == 0 || pg->off > pg_prev->off ) failed: i 32 p_c 155 pg ffffea001bcb95e0 [pri 0 ind 264145] off 14807040 prev_pg ffffea001bff2bd0 [pri 18446612166806628160 ind 3615] off 14807040 854062 doio_mpi |
| Comments |
| Comment by Keith Mannthey (Inactive) [ 18/Apr/13 ] |
|
What is 2.3 SP1 and SP2? Do you see this with the 2.3 branch in the git tree? Or with Master? |
| Comment by Peter Jones [ 18/Apr/13 ] |
|
Keith I think that Vitaly means SLES11 SP1 and SP2 |
| Comment by Vitaly Fertman [ 18/Apr/13 ] |
|
this is not correct to kick transient pages off the CLIO cache, because we may get 2 sets of pages in osc_build_req(), cacheable and transient ones, for the same offsets. http://review.whamcloud.com/6096 - revert of |
| Comment by Vitaly Fertman [ 18/Apr/13 ] |
|
lustre 2.3 on sles11 sp1/sp2 |
| Comment by Oleg Drokin [ 22/Apr/13 ] |
|
So can you please describe the scenario to hit this and the environment? |
| Comment by Vitaly Fertman [ 22/Apr/13 ] |
|
due to putting out transient pages of the page cache, we may get 2 sets of pages, cacheable and transient, for the same offsets, originated in BIO and DIO. later, when preparing rpc, such pages are gathered for the same rpc, and assertion that 2 pages with the same offset are found is hit. |
| Comment by Niu Yawei (Inactive) [ 23/Apr/13 ] |
|
Actually, I just started refresh the fix of |
| Comment by Andreas Dilger [ 24/Apr/13 ] |
|
Vitaly, can you please tell us what the workload is that is triggering this problem? Is this a synthetic workload that is generating "difficult" IO patterns with O_DIRECT and buffered IO on the same file offset, or is this a failure seen by a real application running from a user? I would be surprised if any real application will have an IO workload to trigger this, but then again application developers are known to do all kinds of strange things... If this is only from a synthetic workload, then I'm inclined to drop it as a blocker for the 2.4.0 release, since this is extremely unlikely to be hit under real-world usage. |
| Comment by Cheng Shao (Inactive) [ 24/Apr/13 ] |
|
Andreas, this is not some real application workload. It is from an internal stress test. |
| Comment by Jinshan Xiong (Inactive) [ 25/Apr/13 ] |
|
Let's mark the transient page as special in IO engine so that it can't be merged with caching pages. |
| Comment by Alexey Lyashkov [ 26/Apr/13 ] |
|
Jay, transient pages already marked as special and live in CLIO cache until we have IO finished, and merging with buffered pages exist in CLIO from beginning, so we may say - that is broken for now. Per Nikita`s comments - he tried to present a flat file as offset range with clio pages if it's covered by lock. That allowed to re-stripe it easy just a change mapping a some offset range to new osc object. from other view - OST uses a single page cache to store DIO and buffered IO so we have no differences (buffered IO may read DIO data without access to disk). so we have free to use same view on client side, have a single CLIO page cache to store a data before send over network. That way is better in case we will return to idea to create a p2p communication and client may send own data to new client without asking any data from a server. That is very usefull in excacale as extend buffer cache dramatically. Because of it, i think we need to return to put transient pages into CLIO cache, but remember that is special pages and need disowned after IO finished and don't forget we need to stay in waiting until that data committed on OST side and don't removed a request from sending queue, otherwise we will hit a |
| Comment by Jinshan Xiong (Inactive) [ 26/Apr/13 ] |
It was. But we found a problem in
This turns out too optimistic. Restriping is a case of layout change and right now we have a complete solution for layout change. The safest way to change a file's layout is to clean up file's page cache in prior. Don't bother talking about performance as changing layout is a rare operation
hmm.. I tend to think DIO shouldn't pollute cache in anyways - this is why it is called Direct IO. Even on the OST side, it should go directly into disks before replying the clients. Also it;s really dangerous to cache write of DIO on the OSTs. |
| Comment by Niu Yawei (Inactive) [ 27/Apr/13 ] |
|
Alexey, we've seen quite a few problems caused by putting 'transient' pages in the radix tree:
All in all, DIO pages are managed by applications, I don't see any reason to store them in the radix tree. (add them in the tree before io then remove on io completion looks only increase overhead). I've refreshed the patch of |
| Comment by Alexey Lyashkov [ 28/Apr/13 ] |
|
> It was. But we found a problem in Vitaly have fix for >This turns out too optimistic. Restriping is a case of layout change and right now we have a complete solution for layout change. The safest way to change a file's layout is to clean up file's page cache in prior. Don't bother talking about performance as changing layout is a rare operation why? if you change a layout - you just change a translation about offset in file, to data object and offset to object. main question in that case - have a stable point when old layout and new layout have a same contents. > hmm.. I tend to think DIO shouldn't pollute cache in anyways - this is why it is called Direct IO. Even on the OST side, it should go directly into disks before replying the clients. Also it;s really dangerous to cache write of DIO on the OSTs. by POSIX/SUS definition, DIO need bypass only pagecache, but don't see anything about raid cache, disk drive cache or other cache types... http://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xbd_chap03.html
Direct I/O
Historically, direct I/O refers to the system bypassing intermediate buffering, but may be extended to cover implementation-defined optimizations.
|
| Comment by Alexey Lyashkov [ 28/Apr/13 ] |
|
> Race of buffered io vs. dio described in this ticket & That ticket isn't race. that ticket is result of "fix" in > Unable to support unalinged DIO. Lustre now can only do aligned DIO (buffer address, offset & write count must be page size aligned), the reason we have such unnecessary That is big question. You need to read full page anyway on OST side and have full page write. as about lockless IO.. lock less IO more easy implement as direct access to osc_brw api or other ways but have similar to obdecho client. > All in all, DIO pages are managed by applications, I don't see any reason to store them in the radix tree. (add them in the tree before io then remove on io completion looks only increase overhead). if you don't plan to have it's in radix tree - why it's should be covered by CLIO page? kill these pages from CLIO that is reduce a complexity of CLIO (we will have a single page type). |
| Comment by Alexander Boyko [ 01/Jul/14 ] |
| Comment by Cory Spitz [ 07/Nov/14 ] |
|
http://review.whamcloud.com/#/c/10930 landed, so should this bug be closed? |
| Comment by Peter Jones [ 07/Nov/14 ] |
|
Yes it should - thanks! |