Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6666

osc_brw_prep_request()) ASSERTION( page_count == 1 || (ergo(i == 0, poff + pg->count == PAGE_CACHE_SIZE)

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.8.0
    • Lustre 2.8.0
    • None
    • 3
    • 9223372036854775807

    Description

      I met this during racer. This looks like LU-6227, but not in directIO patch.

       
      VFS: Error -28 occurred while creating quota.
      VFS: Error -28 occurred while creating quota.
      VFS: Error -28 occurred while creating quota.
      VFS: Error -28 occurred while creating quota.
      VFS: Error -28 occurred while creating quota.
      VFS: Error -28 occurred while creating quota.
      VFS: Error -28 occurred while creating quota.
      VFS: Error -28 occurred while creating quota.
      VFS: Error -28 occurred while creating quota.
      VFS: Error -28 occurred while creating quota.
      VFS: Error -28 occurred while creating quota.
      LustreError: 19811:0:(osc_request.c:1101:osc_brw_prep_request()) ASSERTION( page_count == 1 || (ergo(i == 0, poff + pg->count == PAGE_CACHE_SIZE) && ergo(i > 0 && i < page_count - 1, poff == 0 && pg->count == PAGE_CACHE_SIZE) && ergo(i == page_count - 1, poff == 0)) ) failed: i: 6/27 pg: ffff8801efd6d640 off: 24576, count: 3272
      LustreError: 19811:0:(osc_request.c:1101:osc_brw_prep_request()) LBUG
      Pid: 19811, comm: ptlrpcd_4
      
      Call Trace:
       [<ffffffffa116b875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
       [<ffffffffa116be77>] lbug_with_loc+0x47/0xb0 [libcfs]
       [<ffffffffa08a4b55>] osc_brw_prep_request+0xc35/0x10a0 [osc]
       [<ffffffffa08b4471>] ? osc_req_attr_set+0x1b1/0x740 [osc]
       [<ffffffffa08a5859>] osc_build_rpc+0x899/0x15c0 [osc]
       [<ffffffffa08c0eda>] osc_io_unplug0+0x115a/0x1b40 [osc]
       [<ffffffffa08b9a83>] ? osc_ap_completion+0x213/0x600 [osc]
       [<ffffffffa156d8bb>] ? lu_object_put+0x12b/0x310 [obdclass]
       [<ffffffffa08c3e61>] osc_io_unplug+0x11/0x20 [osc]
       [<ffffffffa08a746f>] brw_interpret+0x9bf/0x1fa0 [osc]
       [<ffffffffa060eadc>] ? ptlrpc_free_committed+0x56c/0x770 [ptlrpc]
       [<ffffffffa061bdb2>] ? ptlrpc_unregister_bulk+0xa2/0xac0 [ptlrpc]
       [<ffffffffa0610772>] ? after_reply+0xcb2/0xeb0 [ptlrpc]
       [<ffffffffa0614ab1>] ptlrpc_check_set+0x331/0x1c70 [ptlrpc]
       [<ffffffff81087fdb>] ? try_to_del_timer_sync+0x7b/0xe0
       [<ffffffffa0642393>] ptlrpcd_check+0x533/0x550 [ptlrpc]
       [<ffffffffa06429cb>] ptlrpcd+0x35b/0x430 [ptlrpc]
       [<ffffffff81064b90>] ? default_wake_function+0x0/0x20
       [<ffffffffa0642670>] ? ptlrpcd+0x0/0x430 [ptlrpc]
       [<ffffffff8109e66e>] kthread+0x9e/0xc0
       [<ffffffff8100c20a>] child_rip+0xa/0x20
       [<ffffffff8109e5d0>] ? kthread+0x0/0xc0
       [<ffffffff8100c200>] ? child_rip+0x0/0x20
      

      Attachments

        Issue Links

          Activity

            [LU-6666] osc_brw_prep_request()) ASSERTION( page_count == 1 || (ergo(i == 0, poff + pg->count == PAGE_CACHE_SIZE)

            Jinshan Xiong (jinshan.xiong@intel.com) uploaded a new patch: http://review.whamcloud.com/15468
            Subject: LU-6666 osc: Do not merge extents with partial pages
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: af031ebb20db32e0ab01a558b05442efcede5dbf

            gerrit Gerrit Updater added a comment - Jinshan Xiong (jinshan.xiong@intel.com) uploaded a new patch: http://review.whamcloud.com/15468 Subject: LU-6666 osc: Do not merge extents with partial pages Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: af031ebb20db32e0ab01a558b05442efcede5dbf

            Hi Artem,

            This issue is indeed introduced by LU-1669. When OST is out of space, and two processes write disjoint file range that don't overlap by page basis, the two writes can be merged into the same synchronous write RPC, and then hit this issue.

            This problem can be fixed by adding a mutex in vvp_io_commit_sync() to serialize sync threads.

            jay Jinshan Xiong (Inactive) added a comment - Hi Artem, This issue is indeed introduced by LU-1669 . When OST is out of space, and two processes write disjoint file range that don't overlap by page basis, the two writes can be merged into the same synchronous write RPC, and then hit this issue. This problem can be fixed by adding a mutex in vvp_io_commit_sync() to serialize sync threads.

            >Artem, were you doing direct I/O?
            Can't determine is this direct I/O from crash dump. I believe them have the same code path.

            artem_blagodarenko Artem Blagodarenko (Inactive) added a comment - - edited >Artem, were you doing direct I/O? Can't determine is this direct I/O from crash dump. I believe them have the same code path.

            >Now that you can see pages in RPCs in question, can you post segments of continuous pages?

            Jinshan Xiong, I can't understed what exact information I need to post. Can you give some details.
            Post both brw_pages now.

            crash-7.0.1-29bit> search -K 0xffff88044b96cf78
            ffff88044a218728: ffff88044b96cf78 
            ffff88045392d858: ffff88044b96cf78 
            
            crash-7.0.1-29bit> brw_page ffff88044c771378
            struct brw_page {
              off = 7872511, 
              pg = 0xffffea000f8698d0, 
              count = 1, 
              flag = 8
            }
            crash-7.0.1-29bit> ffff88044b96cf78
            crash-7.0.1-29bit: command not found: ffff88044b96cf78
            crash-7.0.1-29bit> brw_page ffff88044b96cf78
            struct brw_page {
              off = 7884799, 
              pg = 0xffffea000f865500, 
              count = 1, 
              flag = 8
            }
            
            artem_blagodarenko Artem Blagodarenko (Inactive) added a comment - - edited >Now that you can see pages in RPCs in question, can you post segments of continuous pages? Jinshan Xiong, I can't understed what exact information I need to post. Can you give some details. Post both brw_pages now. crash-7.0.1-29bit> search -K 0xffff88044b96cf78 ffff88044a218728: ffff88044b96cf78 ffff88045392d858: ffff88044b96cf78 crash-7.0.1-29bit> brw_page ffff88044c771378 struct brw_page { off = 7872511, pg = 0xffffea000f8698d0, count = 1, flag = 8 } crash-7.0.1-29bit> ffff88044b96cf78 crash-7.0.1-29bit: command not found: ffff88044b96cf78 crash-7.0.1-29bit> brw_page ffff88044b96cf78 struct brw_page { off = 7884799, pg = 0xffffea000f865500, count = 1, flag = 8 }

            Artem, were you doing direct I/O? Otherwise this looks like a race between osc_refresh_count() and osc_page_touch_at(). Now that you can see pages in RPCs in question, can you post segments of continuous pages?

            jay Jinshan Xiong (Inactive) added a comment - Artem, were you doing direct I/O? Otherwise this looks like a race between osc_refresh_count() and osc_page_touch_at(). Now that you can see pages in RPCs in question, can you post segments of continuous pages?

            Is it possible that pages from output above can be in one rpc after landing patches from https://jira.hpdd.intel.com/browse/LU-1669?

            Range lock will be worked out to allow write to a shared file in parallel, while Posix compliance is still guaranteed.
            
            artem_blagodarenko Artem Blagodarenko (Inactive) added a comment - Is it possible that pages from output above can be in one rpc after landing patches from https://jira.hpdd.intel.com/browse/LU-1669? Range lock will be worked out to allow write to a shared file in parallel, while Posix compliance is still guaranteed.

            Some additional information. Thread A
            There are two objects

            struct osc_async_page {
             ...
              oap_obj_off = 0x23000, 
              oap_page_off = 0x873, 
            ...  
              count = 0x41d, 
            ...
            }
            
            struct osc_async_page {
              oap_obj_off = 0x26000, 
              oap_page_off = 0x5eb, 
            ...
                count = 0xa15, 
            }
            

            so LustreError: 4713:0:(osc_request.c:1812:osc_build_rpc()) ASSERTION( oap->oap_page_off == 0 ) failed: happened
            thread B

            oap_obj_off = 0x1f000, 
              oap_page_off = 0x351, 
            ...
                count = 0x342, 
            ...
            
            and
            
            oap_obj_off = 0x4e000, 
              oap_page_off = 0x542, 
              count = 0x1c4, 
            

            so,

            LustreError: 4718:0:(osc_request.c:1818:osc_build_rpc()) ASSERTION( oap->oap_page_off + oap->oap_brw_page.count == ((1UL) << 12) ) failed

            artem_blagodarenko Artem Blagodarenko (Inactive) added a comment - Some additional information. Thread A There are two objects struct osc_async_page { ... oap_obj_off = 0x23000, oap_page_off = 0x873, ... count = 0x41d, ... } struct osc_async_page { oap_obj_off = 0x26000, oap_page_off = 0x5eb, ... count = 0xa15, } so LustreError: 4713:0:(osc_request.c:1812:osc_build_rpc()) ASSERTION( oap->oap_page_off == 0 ) failed: happened thread B oap_obj_off = 0x1f000, oap_page_off = 0x351, ... count = 0x342, ... and oap_obj_off = 0x4e000, oap_page_off = 0x542, count = 0x1c4, so, LustreError: 4718:0:(osc_request.c:1818:osc_build_rpc()) ASSERTION( oap->oap_page_off + oap->oap_brw_page.count == ((1UL) << 12) ) failed

            Happened again with some another symptoms (LBUGs)

            LustreError: 4713:0:(osc_request.c:1812:osc_build_rpc()) ASSERTION( oap->oap_page_off == 0 ) failed:
            LustreError: 4718:0:(osc_request.c:1818:osc_build_rpc()) ASSERTION( oap->oap_page_off + oap->oap_brw_page.count == ((1UL) << 12) ) failed:
            LustreError: 4718:0:(osc_request.c:1818:osc_build_rpc()) LBUG
            c0-0c0s10n1:
            LustreError: 4661:0:(osc_request.c:1212:osc_brw_prep_request()) ASSERTION( page_count == 1 || (ergo(i == 0, poff + pg->count == PAGE_CACHE_SIZE) && ergo(i > 0 && i < page_count - 1, poff == 0 && pg->count == PAGE_CACHE_SIZE) && ergo(i == page_count - 1, poff == 0)) ) failed: i: 1/2 pg: ffff88044b96cf78 off: 7884799, count: 1
            LustreError: 4661:0:(osc_request.c:1212:osc_brw_prep_request()) LBUG
            
            artem_blagodarenko Artem Blagodarenko (Inactive) added a comment - Happened again with some another symptoms (LBUGs) LustreError: 4713:0:(osc_request.c:1812:osc_build_rpc()) ASSERTION( oap->oap_page_off == 0 ) failed: LustreError: 4718:0:(osc_request.c:1818:osc_build_rpc()) ASSERTION( oap->oap_page_off + oap->oap_brw_page.count == ((1UL) << 12) ) failed: LustreError: 4718:0:(osc_request.c:1818:osc_build_rpc()) LBUG c0-0c0s10n1: LustreError: 4661:0:(osc_request.c:1212:osc_brw_prep_request()) ASSERTION( page_count == 1 || (ergo(i == 0, poff + pg->count == PAGE_CACHE_SIZE) && ergo(i > 0 && i < page_count - 1, poff == 0 && pg->count == PAGE_CACHE_SIZE) && ergo(i == page_count - 1, poff == 0)) ) failed: i: 1/2 pg: ffff88044b96cf78 off: 7884799, count: 1 LustreError: 4661:0:(osc_request.c:1212:osc_brw_prep_request()) LBUG
            bobijam Zhenyu Xu added a comment -

            close it for now until we catch it again.

            bobijam Zhenyu Xu added a comment - close it for now until we catch it again.

            Hi, Bobi

            I tried a few times on my branch. Sorry I can not reproduce it. Maybe it is just temporary.

            di.wang Di Wang (Inactive) added a comment - Hi, Bobi I tried a few times on my branch. Sorry I can not reproduce it. Maybe it is just temporary.
            bobijam Zhenyu Xu added a comment -

            Can you provide how to reproduce the issue?

            bobijam Zhenyu Xu added a comment - Can you provide how to reproduce the issue?

            People

              jay Jinshan Xiong (Inactive)
              di.wang Di Wang (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: