Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6666

osc_brw_prep_request()) ASSERTION( page_count == 1 || (ergo(i == 0, poff + pg->count == PAGE_CACHE_SIZE)

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.8.0
    • Lustre 2.8.0
    • None
    • 3
    • 9223372036854775807

    Description

      I met this during racer. This looks like LU-6227, but not in directIO patch.

       
      VFS: Error -28 occurred while creating quota.
      VFS: Error -28 occurred while creating quota.
      VFS: Error -28 occurred while creating quota.
      VFS: Error -28 occurred while creating quota.
      VFS: Error -28 occurred while creating quota.
      VFS: Error -28 occurred while creating quota.
      VFS: Error -28 occurred while creating quota.
      VFS: Error -28 occurred while creating quota.
      VFS: Error -28 occurred while creating quota.
      VFS: Error -28 occurred while creating quota.
      VFS: Error -28 occurred while creating quota.
      LustreError: 19811:0:(osc_request.c:1101:osc_brw_prep_request()) ASSERTION( page_count == 1 || (ergo(i == 0, poff + pg->count == PAGE_CACHE_SIZE) && ergo(i > 0 && i < page_count - 1, poff == 0 && pg->count == PAGE_CACHE_SIZE) && ergo(i == page_count - 1, poff == 0)) ) failed: i: 6/27 pg: ffff8801efd6d640 off: 24576, count: 3272
      LustreError: 19811:0:(osc_request.c:1101:osc_brw_prep_request()) LBUG
      Pid: 19811, comm: ptlrpcd_4
      
      Call Trace:
       [<ffffffffa116b875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
       [<ffffffffa116be77>] lbug_with_loc+0x47/0xb0 [libcfs]
       [<ffffffffa08a4b55>] osc_brw_prep_request+0xc35/0x10a0 [osc]
       [<ffffffffa08b4471>] ? osc_req_attr_set+0x1b1/0x740 [osc]
       [<ffffffffa08a5859>] osc_build_rpc+0x899/0x15c0 [osc]
       [<ffffffffa08c0eda>] osc_io_unplug0+0x115a/0x1b40 [osc]
       [<ffffffffa08b9a83>] ? osc_ap_completion+0x213/0x600 [osc]
       [<ffffffffa156d8bb>] ? lu_object_put+0x12b/0x310 [obdclass]
       [<ffffffffa08c3e61>] osc_io_unplug+0x11/0x20 [osc]
       [<ffffffffa08a746f>] brw_interpret+0x9bf/0x1fa0 [osc]
       [<ffffffffa060eadc>] ? ptlrpc_free_committed+0x56c/0x770 [ptlrpc]
       [<ffffffffa061bdb2>] ? ptlrpc_unregister_bulk+0xa2/0xac0 [ptlrpc]
       [<ffffffffa0610772>] ? after_reply+0xcb2/0xeb0 [ptlrpc]
       [<ffffffffa0614ab1>] ptlrpc_check_set+0x331/0x1c70 [ptlrpc]
       [<ffffffff81087fdb>] ? try_to_del_timer_sync+0x7b/0xe0
       [<ffffffffa0642393>] ptlrpcd_check+0x533/0x550 [ptlrpc]
       [<ffffffffa06429cb>] ptlrpcd+0x35b/0x430 [ptlrpc]
       [<ffffffff81064b90>] ? default_wake_function+0x0/0x20
       [<ffffffffa0642670>] ? ptlrpcd+0x0/0x430 [ptlrpc]
       [<ffffffff8109e66e>] kthread+0x9e/0xc0
       [<ffffffff8100c20a>] child_rip+0xa/0x20
       [<ffffffff8109e5d0>] ? kthread+0x0/0xc0
       [<ffffffff8100c200>] ? child_rip+0x0/0x20
      

      Attachments

        Issue Links

          Activity

            [LU-6666] osc_brw_prep_request()) ASSERTION( page_count == 1 || (ergo(i == 0, poff + pg->count == PAGE_CACHE_SIZE)

            Patrick, sorry for long delay in replying. Having full-page holes in the middle of an RPC is fine, since a BRW RPC can have multiple niobufs in it. The problem is with partial page writes. That used to happen with liblustre (deprecated userspace library), but I'm not even sure it is possible with llite (kernel VFS client) since always full pages are written unless it is at EOF, or possibly with O_DIRECT?

            The other question is whether this bug is still being hit with recent builds, or has it been fixed by something else and could be closed?

            adilger Andreas Dilger added a comment - Patrick, sorry for long delay in replying. Having full-page holes in the middle of an RPC is fine, since a BRW RPC can have multiple niobufs in it. The problem is with partial page writes. That used to happen with liblustre (deprecated userspace library), but I'm not even sure it is possible with llite (kernel VFS client) since always full pages are written unless it is at EOF, or possibly with O_DIRECT? The other question is whether this bug is still being hit with recent builds, or has it been fixed by something else and could be closed?

            Andreas, Jinshan - Apologies if I'm misunderstanding, but more questions...

            You're saying an RPC can't have gaps in the middle. I'm wondering about the meaning of gap. Does that mean it's an issue to have any gap in the data in a bulk RPC, like two non-contiguous pages? Or is the meaning of 'gap' limited to a fragmented page?

            If the first meaning (including non-contiguous full pages) is correct, I don't see how Jinshan's patch keeps us safe by preventing merging partial pages. (As I don't see how the code enforces merging only contiguous extents, though the comment on get_write_extents implies that's the case.)

            paf Patrick Farrell (Inactive) added a comment - Andreas, Jinshan - Apologies if I'm misunderstanding, but more questions... You're saying an RPC can't have gaps in the middle. I'm wondering about the meaning of gap. Does that mean it's an issue to have any gap in the data in a bulk RPC, like two non-contiguous pages? Or is the meaning of 'gap' limited to a fragmented page? If the first meaning (including non-contiguous full pages) is correct, I don't see how Jinshan's patch keeps us safe by preventing merging partial pages. (As I don't see how the code enforces merging only contiguous extents, though the comment on get_write_extents implies that's the case.)

            Patrick, it might also happen if the user runs out of quota, I believe.

            The reason it isn't ok to merge multiple fragments is because of LNet RDMA. The way the data is packed into the bulk RPC can't have gaps in the middle due to some RDMA implementations, only a fragment at the end (which is very common).

            If the pages were cached on the client it would be possible to just expand the write to cover the whole page in the middle of the file, since the client needs to do a read-modify-write of the page. With O_DIRECT that isn't possible. Not knowledgable enough in CLIO to know whether the out-of-space handling could be changed to allow this or not.

            adilger Andreas Dilger added a comment - Patrick, it might also happen if the user runs out of quota, I believe. The reason it isn't ok to merge multiple fragments is because of LNet RDMA. The way the data is packed into the bulk RPC can't have gaps in the middle due to some RDMA implementations, only a fragment at the end (which is very common). If the pages were cached on the client it would be possible to just expand the write to cover the whole page in the middle of the file, since the client needs to do a read-modify-write of the page. With O_DIRECT that isn't possible. Not knowledgable enough in CLIO to know whether the out-of-space handling could be changed to allow this or not.

            Jinshan - Are you saying the OST must be running out of space for this to happen? I'm almost certain in the instance Artem is talking about (the dump is from Cray), the OST is not low on space.

            I'm puzzled by why the OST running out of space would impact client caching? Have I misunderstood?

            One last thing - Why is it specifically unsafe to merge two extents with partial pages? (if I'm reading right, it is safe to merge one extent with a partial page with another that doesn't have a partial page.)

            paf Patrick Farrell (Inactive) added a comment - Jinshan - Are you saying the OST must be running out of space for this to happen? I'm almost certain in the instance Artem is talking about (the dump is from Cray), the OST is not low on space. I'm puzzled by why the OST running out of space would impact client caching? Have I misunderstood? One last thing - Why is it specifically unsafe to merge two extents with partial pages? (if I'm reading right, it is safe to merge one extent with a partial page with another that doesn't have a partial page.)

            This can only happen when OST is running out of space, so the victim client can't cache page any more instead it uses sync I/O to write pages. For example, the first thread writes [0, 1024) and the 2nd thread writes [8192, 9216), these writes are not conflicted by range locks, and finally these two writes are picked by the same RPC and then hit this bug.

            jay Jinshan Xiong (Inactive) added a comment - This can only happen when OST is running out of space, so the victim client can't cache page any more instead it uses sync I/O to write pages. For example, the first thread writes [0, 1024) and the 2nd thread writes [8192, 9216), these writes are not conflicted by range locks, and finally these two writes are picked by the same RPC and then hit this bug.
            adilger Andreas Dilger added a comment - - edited

            If the pages are cached on the client (i.e. not O_DIRECT) then the full page must be cached since it isn't possible to mark only part of a page dirty, so I don't see how there can be multiple partial-page writes to the same file being merged?

            adilger Andreas Dilger added a comment - - edited If the pages are cached on the client (i.e. not O_DIRECT) then the full page must be cached since it isn't possible to mark only part of a page dirty, so I don't see how there can be multiple partial-page writes to the same file being merged?

            Jinshan Xiong (jinshan.xiong@intel.com) uploaded a new patch: http://review.whamcloud.com/15468
            Subject: LU-6666 osc: Do not merge extents with partial pages
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: af031ebb20db32e0ab01a558b05442efcede5dbf

            gerrit Gerrit Updater added a comment - Jinshan Xiong (jinshan.xiong@intel.com) uploaded a new patch: http://review.whamcloud.com/15468 Subject: LU-6666 osc: Do not merge extents with partial pages Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: af031ebb20db32e0ab01a558b05442efcede5dbf

            Hi Artem,

            This issue is indeed introduced by LU-1669. When OST is out of space, and two processes write disjoint file range that don't overlap by page basis, the two writes can be merged into the same synchronous write RPC, and then hit this issue.

            This problem can be fixed by adding a mutex in vvp_io_commit_sync() to serialize sync threads.

            jay Jinshan Xiong (Inactive) added a comment - Hi Artem, This issue is indeed introduced by LU-1669 . When OST is out of space, and two processes write disjoint file range that don't overlap by page basis, the two writes can be merged into the same synchronous write RPC, and then hit this issue. This problem can be fixed by adding a mutex in vvp_io_commit_sync() to serialize sync threads.

            >Artem, were you doing direct I/O?
            Can't determine is this direct I/O from crash dump. I believe them have the same code path.

            artem_blagodarenko Artem Blagodarenko (Inactive) added a comment - - edited >Artem, were you doing direct I/O? Can't determine is this direct I/O from crash dump. I believe them have the same code path.

            >Now that you can see pages in RPCs in question, can you post segments of continuous pages?

            Jinshan Xiong, I can't understed what exact information I need to post. Can you give some details.
            Post both brw_pages now.

            crash-7.0.1-29bit> search -K 0xffff88044b96cf78
            ffff88044a218728: ffff88044b96cf78 
            ffff88045392d858: ffff88044b96cf78 
            
            crash-7.0.1-29bit> brw_page ffff88044c771378
            struct brw_page {
              off = 7872511, 
              pg = 0xffffea000f8698d0, 
              count = 1, 
              flag = 8
            }
            crash-7.0.1-29bit> ffff88044b96cf78
            crash-7.0.1-29bit: command not found: ffff88044b96cf78
            crash-7.0.1-29bit> brw_page ffff88044b96cf78
            struct brw_page {
              off = 7884799, 
              pg = 0xffffea000f865500, 
              count = 1, 
              flag = 8
            }
            
            artem_blagodarenko Artem Blagodarenko (Inactive) added a comment - - edited >Now that you can see pages in RPCs in question, can you post segments of continuous pages? Jinshan Xiong, I can't understed what exact information I need to post. Can you give some details. Post both brw_pages now. crash-7.0.1-29bit> search -K 0xffff88044b96cf78 ffff88044a218728: ffff88044b96cf78 ffff88045392d858: ffff88044b96cf78 crash-7.0.1-29bit> brw_page ffff88044c771378 struct brw_page { off = 7872511, pg = 0xffffea000f8698d0, count = 1, flag = 8 } crash-7.0.1-29bit> ffff88044b96cf78 crash-7.0.1-29bit: command not found: ffff88044b96cf78 crash-7.0.1-29bit> brw_page ffff88044b96cf78 struct brw_page { off = 7884799, pg = 0xffffea000f865500, count = 1, flag = 8 }

            Artem, were you doing direct I/O? Otherwise this looks like a race between osc_refresh_count() and osc_page_touch_at(). Now that you can see pages in RPCs in question, can you post segments of continuous pages?

            jay Jinshan Xiong (Inactive) added a comment - Artem, were you doing direct I/O? Otherwise this looks like a race between osc_refresh_count() and osc_page_touch_at(). Now that you can see pages in RPCs in question, can you post segments of continuous pages?

            People

              jay Jinshan Xiong (Inactive)
              di.wang Di Wang
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: