Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6666

osc_brw_prep_request()) ASSERTION( page_count == 1 || (ergo(i == 0, poff + pg->count == PAGE_CACHE_SIZE)

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.8.0
    • Lustre 2.8.0
    • None
    • 3
    • 9223372036854775807

    Description

      I met this during racer. This looks like LU-6227, but not in directIO patch.

       
      VFS: Error -28 occurred while creating quota.
      VFS: Error -28 occurred while creating quota.
      VFS: Error -28 occurred while creating quota.
      VFS: Error -28 occurred while creating quota.
      VFS: Error -28 occurred while creating quota.
      VFS: Error -28 occurred while creating quota.
      VFS: Error -28 occurred while creating quota.
      VFS: Error -28 occurred while creating quota.
      VFS: Error -28 occurred while creating quota.
      VFS: Error -28 occurred while creating quota.
      VFS: Error -28 occurred while creating quota.
      LustreError: 19811:0:(osc_request.c:1101:osc_brw_prep_request()) ASSERTION( page_count == 1 || (ergo(i == 0, poff + pg->count == PAGE_CACHE_SIZE) && ergo(i > 0 && i < page_count - 1, poff == 0 && pg->count == PAGE_CACHE_SIZE) && ergo(i == page_count - 1, poff == 0)) ) failed: i: 6/27 pg: ffff8801efd6d640 off: 24576, count: 3272
      LustreError: 19811:0:(osc_request.c:1101:osc_brw_prep_request()) LBUG
      Pid: 19811, comm: ptlrpcd_4
      
      Call Trace:
       [<ffffffffa116b875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
       [<ffffffffa116be77>] lbug_with_loc+0x47/0xb0 [libcfs]
       [<ffffffffa08a4b55>] osc_brw_prep_request+0xc35/0x10a0 [osc]
       [<ffffffffa08b4471>] ? osc_req_attr_set+0x1b1/0x740 [osc]
       [<ffffffffa08a5859>] osc_build_rpc+0x899/0x15c0 [osc]
       [<ffffffffa08c0eda>] osc_io_unplug0+0x115a/0x1b40 [osc]
       [<ffffffffa08b9a83>] ? osc_ap_completion+0x213/0x600 [osc]
       [<ffffffffa156d8bb>] ? lu_object_put+0x12b/0x310 [obdclass]
       [<ffffffffa08c3e61>] osc_io_unplug+0x11/0x20 [osc]
       [<ffffffffa08a746f>] brw_interpret+0x9bf/0x1fa0 [osc]
       [<ffffffffa060eadc>] ? ptlrpc_free_committed+0x56c/0x770 [ptlrpc]
       [<ffffffffa061bdb2>] ? ptlrpc_unregister_bulk+0xa2/0xac0 [ptlrpc]
       [<ffffffffa0610772>] ? after_reply+0xcb2/0xeb0 [ptlrpc]
       [<ffffffffa0614ab1>] ptlrpc_check_set+0x331/0x1c70 [ptlrpc]
       [<ffffffff81087fdb>] ? try_to_del_timer_sync+0x7b/0xe0
       [<ffffffffa0642393>] ptlrpcd_check+0x533/0x550 [ptlrpc]
       [<ffffffffa06429cb>] ptlrpcd+0x35b/0x430 [ptlrpc]
       [<ffffffff81064b90>] ? default_wake_function+0x0/0x20
       [<ffffffffa0642670>] ? ptlrpcd+0x0/0x430 [ptlrpc]
       [<ffffffff8109e66e>] kthread+0x9e/0xc0
       [<ffffffff8100c20a>] child_rip+0xa/0x20
       [<ffffffff8109e5d0>] ? kthread+0x0/0xc0
       [<ffffffff8100c200>] ? child_rip+0x0/0x20
      

      Attachments

        Issue Links

          Activity

            [LU-6666] osc_brw_prep_request()) ASSERTION( page_count == 1 || (ergo(i == 0, poff + pg->count == PAGE_CACHE_SIZE)

            Landed for 2.8

            jgmitter Joseph Gmitter (Inactive) added a comment - Landed for 2.8

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/15468/
            Subject: LU-6666 osc: Do not merge extents with partial pages
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: ac1d6ab73c733caf2fa0ccf504955b23d3e572f0

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/15468/ Subject: LU-6666 osc: Do not merge extents with partial pages Project: fs/lustre-release Branch: master Current Patch Set: Commit: ac1d6ab73c733caf2fa0ccf504955b23d3e572f0

            I was just able to test this - It does resolve the assertions in our 4 MiB RPC testing. I don't have a specific, well defined test case, however. And I'm still fairly sure we're not out of space or out of quota when hitting this.

            We're hitting this when running a few different tests. One of them is growfiles from ltp.

            This stupid little shell script hit for us eventually, but only on nodes with a number of real CPUs - no luck in VMs. (The numbers before the tests are internal Cray test numbers):

            #29
            ARG1='-W gf29 -b -D 0 -r 1-4096 -R 0-33554432 -i 0 -L 60 -C 1 -u gfsparse-3-$$'
            #117
            ARG2='-W gf117 -b -e 1 -i 0 -L 120 -u -g 5000 -T 100 -t 499990 -C 10 -c 1000 -S 10 -f Lgf03'
            #309
            ARG3=' -W gf309 -b -e 1 -u -r 1-5000 -R 0--1 -i 0 -L 30 -C 1 -I p g_rand12 g_rand12.2'
            #701
            ARG4='-W gf701 -b -e 1 -u -i 0 -L 20 -w -C 1 -I r -T 10 glseek20 glseek20.2'
            #811
            ARG5='-W gf811 -b -e 1 -u -r 1-5000 -i 0 -L 30 -C 1 -I L g_lio15 g_lio15.2'
            #814
            ARG6='-W gf814 -b -e 1 -u -i 0 -L 20 -w -C 1 -T 10 glseek19 glseek19.2'
            #815
            ARG7='-W gf815 -b -e 1 -u -r 1-49600 -I r -u -i 0 -L 120 Lgfile1'
            
            (./growfiles $ARG1)&
            (./growfiles $ARG2)&
            (./growfiles $ARG3)&
            (./growfiles $ARG4)&
            (./growfiles $ARG5)&
            (./growfiles $ARG6)&
            (./growfiles $ARG7)&
            
            paf Patrick Farrell (Inactive) added a comment - - edited I was just able to test this - It does resolve the assertions in our 4 MiB RPC testing. I don't have a specific, well defined test case, however. And I'm still fairly sure we're not out of space or out of quota when hitting this. We're hitting this when running a few different tests. One of them is growfiles from ltp. This stupid little shell script hit for us eventually, but only on nodes with a number of real CPUs - no luck in VMs. (The numbers before the tests are internal Cray test numbers): #29 ARG1='-W gf29 -b -D 0 -r 1-4096 -R 0-33554432 -i 0 -L 60 -C 1 -u gfsparse-3-$$' #117 ARG2='-W gf117 -b -e 1 -i 0 -L 120 -u -g 5000 -T 100 -t 499990 -C 10 -c 1000 -S 10 -f Lgf03' #309 ARG3=' -W gf309 -b -e 1 -u -r 1-5000 -R 0--1 -i 0 -L 30 -C 1 -I p g_rand12 g_rand12.2' #701 ARG4='-W gf701 -b -e 1 -u -i 0 -L 20 -w -C 1 -I r -T 10 glseek20 glseek20.2' #811 ARG5='-W gf811 -b -e 1 -u -r 1-5000 -i 0 -L 30 -C 1 -I L g_lio15 g_lio15.2' #814 ARG6='-W gf814 -b -e 1 -u -i 0 -L 20 -w -C 1 -T 10 glseek19 glseek19.2' #815 ARG7='-W gf815 -b -e 1 -u -r 1-49600 -I r -u -i 0 -L 120 Lgfile1' (./growfiles $ARG1)& (./growfiles $ARG2)& (./growfiles $ARG3)& (./growfiles $ARG4)& (./growfiles $ARG5)& (./growfiles $ARG6)& (./growfiles $ARG7)&

            Patrick, Artem, are you able to test the patch http://review.whamcloud.com/15468 to see if this resolves your problem?

            adilger Andreas Dilger added a comment - Patrick, Artem, are you able to test the patch http://review.whamcloud.com/15468 to see if this resolves your problem?

            Patrick, sorry for long delay in replying. Having full-page holes in the middle of an RPC is fine, since a BRW RPC can have multiple niobufs in it. The problem is with partial page writes. That used to happen with liblustre (deprecated userspace library), but I'm not even sure it is possible with llite (kernel VFS client) since always full pages are written unless it is at EOF, or possibly with O_DIRECT?

            The other question is whether this bug is still being hit with recent builds, or has it been fixed by something else and could be closed?

            adilger Andreas Dilger added a comment - Patrick, sorry for long delay in replying. Having full-page holes in the middle of an RPC is fine, since a BRW RPC can have multiple niobufs in it. The problem is with partial page writes. That used to happen with liblustre (deprecated userspace library), but I'm not even sure it is possible with llite (kernel VFS client) since always full pages are written unless it is at EOF, or possibly with O_DIRECT? The other question is whether this bug is still being hit with recent builds, or has it been fixed by something else and could be closed?

            Andreas, Jinshan - Apologies if I'm misunderstanding, but more questions...

            You're saying an RPC can't have gaps in the middle. I'm wondering about the meaning of gap. Does that mean it's an issue to have any gap in the data in a bulk RPC, like two non-contiguous pages? Or is the meaning of 'gap' limited to a fragmented page?

            If the first meaning (including non-contiguous full pages) is correct, I don't see how Jinshan's patch keeps us safe by preventing merging partial pages. (As I don't see how the code enforces merging only contiguous extents, though the comment on get_write_extents implies that's the case.)

            paf Patrick Farrell (Inactive) added a comment - Andreas, Jinshan - Apologies if I'm misunderstanding, but more questions... You're saying an RPC can't have gaps in the middle. I'm wondering about the meaning of gap. Does that mean it's an issue to have any gap in the data in a bulk RPC, like two non-contiguous pages? Or is the meaning of 'gap' limited to a fragmented page? If the first meaning (including non-contiguous full pages) is correct, I don't see how Jinshan's patch keeps us safe by preventing merging partial pages. (As I don't see how the code enforces merging only contiguous extents, though the comment on get_write_extents implies that's the case.)

            Patrick, it might also happen if the user runs out of quota, I believe.

            The reason it isn't ok to merge multiple fragments is because of LNet RDMA. The way the data is packed into the bulk RPC can't have gaps in the middle due to some RDMA implementations, only a fragment at the end (which is very common).

            If the pages were cached on the client it would be possible to just expand the write to cover the whole page in the middle of the file, since the client needs to do a read-modify-write of the page. With O_DIRECT that isn't possible. Not knowledgable enough in CLIO to know whether the out-of-space handling could be changed to allow this or not.

            adilger Andreas Dilger added a comment - Patrick, it might also happen if the user runs out of quota, I believe. The reason it isn't ok to merge multiple fragments is because of LNet RDMA. The way the data is packed into the bulk RPC can't have gaps in the middle due to some RDMA implementations, only a fragment at the end (which is very common). If the pages were cached on the client it would be possible to just expand the write to cover the whole page in the middle of the file, since the client needs to do a read-modify-write of the page. With O_DIRECT that isn't possible. Not knowledgable enough in CLIO to know whether the out-of-space handling could be changed to allow this or not.

            Jinshan - Are you saying the OST must be running out of space for this to happen? I'm almost certain in the instance Artem is talking about (the dump is from Cray), the OST is not low on space.

            I'm puzzled by why the OST running out of space would impact client caching? Have I misunderstood?

            One last thing - Why is it specifically unsafe to merge two extents with partial pages? (if I'm reading right, it is safe to merge one extent with a partial page with another that doesn't have a partial page.)

            paf Patrick Farrell (Inactive) added a comment - Jinshan - Are you saying the OST must be running out of space for this to happen? I'm almost certain in the instance Artem is talking about (the dump is from Cray), the OST is not low on space. I'm puzzled by why the OST running out of space would impact client caching? Have I misunderstood? One last thing - Why is it specifically unsafe to merge two extents with partial pages? (if I'm reading right, it is safe to merge one extent with a partial page with another that doesn't have a partial page.)

            This can only happen when OST is running out of space, so the victim client can't cache page any more instead it uses sync I/O to write pages. For example, the first thread writes [0, 1024) and the 2nd thread writes [8192, 9216), these writes are not conflicted by range locks, and finally these two writes are picked by the same RPC and then hit this bug.

            jay Jinshan Xiong (Inactive) added a comment - This can only happen when OST is running out of space, so the victim client can't cache page any more instead it uses sync I/O to write pages. For example, the first thread writes [0, 1024) and the 2nd thread writes [8192, 9216), these writes are not conflicted by range locks, and finally these two writes are picked by the same RPC and then hit this bug.
            adilger Andreas Dilger added a comment - - edited

            If the pages are cached on the client (i.e. not O_DIRECT) then the full page must be cached since it isn't possible to mark only part of a page dirty, so I don't see how there can be multiple partial-page writes to the same file being merged?

            adilger Andreas Dilger added a comment - - edited If the pages are cached on the client (i.e. not O_DIRECT) then the full page must be cached since it isn't possible to mark only part of a page dirty, so I don't see how there can be multiple partial-page writes to the same file being merged?

            People

              jay Jinshan Xiong (Inactive)
              di.wang Di Wang
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: