Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6666

osc_brw_prep_request()) ASSERTION( page_count == 1 || (ergo(i == 0, poff + pg->count == PAGE_CACHE_SIZE)

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.8.0
    • Lustre 2.8.0
    • None
    • 3
    • 9223372036854775807

    Description

      I met this during racer. This looks like LU-6227, but not in directIO patch.

       
      VFS: Error -28 occurred while creating quota.
      VFS: Error -28 occurred while creating quota.
      VFS: Error -28 occurred while creating quota.
      VFS: Error -28 occurred while creating quota.
      VFS: Error -28 occurred while creating quota.
      VFS: Error -28 occurred while creating quota.
      VFS: Error -28 occurred while creating quota.
      VFS: Error -28 occurred while creating quota.
      VFS: Error -28 occurred while creating quota.
      VFS: Error -28 occurred while creating quota.
      VFS: Error -28 occurred while creating quota.
      LustreError: 19811:0:(osc_request.c:1101:osc_brw_prep_request()) ASSERTION( page_count == 1 || (ergo(i == 0, poff + pg->count == PAGE_CACHE_SIZE) && ergo(i > 0 && i < page_count - 1, poff == 0 && pg->count == PAGE_CACHE_SIZE) && ergo(i == page_count - 1, poff == 0)) ) failed: i: 6/27 pg: ffff8801efd6d640 off: 24576, count: 3272
      LustreError: 19811:0:(osc_request.c:1101:osc_brw_prep_request()) LBUG
      Pid: 19811, comm: ptlrpcd_4
      
      Call Trace:
       [<ffffffffa116b875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
       [<ffffffffa116be77>] lbug_with_loc+0x47/0xb0 [libcfs]
       [<ffffffffa08a4b55>] osc_brw_prep_request+0xc35/0x10a0 [osc]
       [<ffffffffa08b4471>] ? osc_req_attr_set+0x1b1/0x740 [osc]
       [<ffffffffa08a5859>] osc_build_rpc+0x899/0x15c0 [osc]
       [<ffffffffa08c0eda>] osc_io_unplug0+0x115a/0x1b40 [osc]
       [<ffffffffa08b9a83>] ? osc_ap_completion+0x213/0x600 [osc]
       [<ffffffffa156d8bb>] ? lu_object_put+0x12b/0x310 [obdclass]
       [<ffffffffa08c3e61>] osc_io_unplug+0x11/0x20 [osc]
       [<ffffffffa08a746f>] brw_interpret+0x9bf/0x1fa0 [osc]
       [<ffffffffa060eadc>] ? ptlrpc_free_committed+0x56c/0x770 [ptlrpc]
       [<ffffffffa061bdb2>] ? ptlrpc_unregister_bulk+0xa2/0xac0 [ptlrpc]
       [<ffffffffa0610772>] ? after_reply+0xcb2/0xeb0 [ptlrpc]
       [<ffffffffa0614ab1>] ptlrpc_check_set+0x331/0x1c70 [ptlrpc]
       [<ffffffff81087fdb>] ? try_to_del_timer_sync+0x7b/0xe0
       [<ffffffffa0642393>] ptlrpcd_check+0x533/0x550 [ptlrpc]
       [<ffffffffa06429cb>] ptlrpcd+0x35b/0x430 [ptlrpc]
       [<ffffffff81064b90>] ? default_wake_function+0x0/0x20
       [<ffffffffa0642670>] ? ptlrpcd+0x0/0x430 [ptlrpc]
       [<ffffffff8109e66e>] kthread+0x9e/0xc0
       [<ffffffff8100c20a>] child_rip+0xa/0x20
       [<ffffffff8109e5d0>] ? kthread+0x0/0xc0
       [<ffffffff8100c200>] ? child_rip+0x0/0x20
      

      Attachments

        Issue Links

          Activity

            [LU-6666] osc_brw_prep_request()) ASSERTION( page_count == 1 || (ergo(i == 0, poff + pg->count == PAGE_CACHE_SIZE)

            Is it possible that pages from output above can be in one rpc after landing patches from https://jira.hpdd.intel.com/browse/LU-1669?

            Range lock will be worked out to allow write to a shared file in parallel, while Posix compliance is still guaranteed.
            
            artem_blagodarenko Artem Blagodarenko (Inactive) added a comment - Is it possible that pages from output above can be in one rpc after landing patches from https://jira.hpdd.intel.com/browse/LU-1669? Range lock will be worked out to allow write to a shared file in parallel, while Posix compliance is still guaranteed.

            Some additional information. Thread A
            There are two objects

            struct osc_async_page {
             ...
              oap_obj_off = 0x23000, 
              oap_page_off = 0x873, 
            ...  
              count = 0x41d, 
            ...
            }
            
            struct osc_async_page {
              oap_obj_off = 0x26000, 
              oap_page_off = 0x5eb, 
            ...
                count = 0xa15, 
            }
            

            so LustreError: 4713:0:(osc_request.c:1812:osc_build_rpc()) ASSERTION( oap->oap_page_off == 0 ) failed: happened
            thread B

            oap_obj_off = 0x1f000, 
              oap_page_off = 0x351, 
            ...
                count = 0x342, 
            ...
            
            and
            
            oap_obj_off = 0x4e000, 
              oap_page_off = 0x542, 
              count = 0x1c4, 
            

            so,

            LustreError: 4718:0:(osc_request.c:1818:osc_build_rpc()) ASSERTION( oap->oap_page_off + oap->oap_brw_page.count == ((1UL) << 12) ) failed

            artem_blagodarenko Artem Blagodarenko (Inactive) added a comment - Some additional information. Thread A There are two objects struct osc_async_page { ... oap_obj_off = 0x23000, oap_page_off = 0x873, ... count = 0x41d, ... } struct osc_async_page { oap_obj_off = 0x26000, oap_page_off = 0x5eb, ... count = 0xa15, } so LustreError: 4713:0:(osc_request.c:1812:osc_build_rpc()) ASSERTION( oap->oap_page_off == 0 ) failed: happened thread B oap_obj_off = 0x1f000, oap_page_off = 0x351, ... count = 0x342, ... and oap_obj_off = 0x4e000, oap_page_off = 0x542, count = 0x1c4, so, LustreError: 4718:0:(osc_request.c:1818:osc_build_rpc()) ASSERTION( oap->oap_page_off + oap->oap_brw_page.count == ((1UL) << 12) ) failed

            Happened again with some another symptoms (LBUGs)

            LustreError: 4713:0:(osc_request.c:1812:osc_build_rpc()) ASSERTION( oap->oap_page_off == 0 ) failed:
            LustreError: 4718:0:(osc_request.c:1818:osc_build_rpc()) ASSERTION( oap->oap_page_off + oap->oap_brw_page.count == ((1UL) << 12) ) failed:
            LustreError: 4718:0:(osc_request.c:1818:osc_build_rpc()) LBUG
            c0-0c0s10n1:
            LustreError: 4661:0:(osc_request.c:1212:osc_brw_prep_request()) ASSERTION( page_count == 1 || (ergo(i == 0, poff + pg->count == PAGE_CACHE_SIZE) && ergo(i > 0 && i < page_count - 1, poff == 0 && pg->count == PAGE_CACHE_SIZE) && ergo(i == page_count - 1, poff == 0)) ) failed: i: 1/2 pg: ffff88044b96cf78 off: 7884799, count: 1
            LustreError: 4661:0:(osc_request.c:1212:osc_brw_prep_request()) LBUG
            
            artem_blagodarenko Artem Blagodarenko (Inactive) added a comment - Happened again with some another symptoms (LBUGs) LustreError: 4713:0:(osc_request.c:1812:osc_build_rpc()) ASSERTION( oap->oap_page_off == 0 ) failed: LustreError: 4718:0:(osc_request.c:1818:osc_build_rpc()) ASSERTION( oap->oap_page_off + oap->oap_brw_page.count == ((1UL) << 12) ) failed: LustreError: 4718:0:(osc_request.c:1818:osc_build_rpc()) LBUG c0-0c0s10n1: LustreError: 4661:0:(osc_request.c:1212:osc_brw_prep_request()) ASSERTION( page_count == 1 || (ergo(i == 0, poff + pg->count == PAGE_CACHE_SIZE) && ergo(i > 0 && i < page_count - 1, poff == 0 && pg->count == PAGE_CACHE_SIZE) && ergo(i == page_count - 1, poff == 0)) ) failed: i: 1/2 pg: ffff88044b96cf78 off: 7884799, count: 1 LustreError: 4661:0:(osc_request.c:1212:osc_brw_prep_request()) LBUG
            bobijam Zhenyu Xu added a comment -

            close it for now until we catch it again.

            bobijam Zhenyu Xu added a comment - close it for now until we catch it again.

            Hi, Bobi

            I tried a few times on my branch. Sorry I can not reproduce it. Maybe it is just temporary.

            di.wang Di Wang (Inactive) added a comment - Hi, Bobi I tried a few times on my branch. Sorry I can not reproduce it. Maybe it is just temporary.
            bobijam Zhenyu Xu added a comment -

            Can you provide how to reproduce the issue?

            bobijam Zhenyu Xu added a comment - Can you provide how to reproduce the issue?

            Sorry, I do not have that.

            di.wang Di Wang (Inactive) added a comment - Sorry, I do not have that.
            bobijam Zhenyu Xu added a comment -

            also need core dump

            bobijam Zhenyu Xu added a comment - also need core dump

            here you are

            di.wang Di Wang (Inactive) added a comment - here you are
            bobijam Zhenyu Xu added a comment -

            Di, is it reproducible? Can I get the core dump and logs?

            bobijam Zhenyu Xu added a comment - Di, is it reproducible? Can I get the core dump and logs?
            bobijam Zhenyu Xu added a comment -

            in osc_build_rpc(), and in "/* add pages into rpc_list to build BRW rpc */" section, it makes sure that the middle pages has oap_page_off ==0 && oap_count == PAGE_CACHE_SIZE, and the error log shows that the 7th page has count == 3272 (its offset is PAGE aligned), could it be the page cache has been truncated while the I/O is undergoing?

            bobijam Zhenyu Xu added a comment - in osc_build_rpc(), and in "/* add pages into rpc_list to build BRW rpc */" section, it makes sure that the middle pages has oap_page_off ==0 && oap_count == PAGE_CACHE_SIZE, and the error log shows that the 7th page has count == 3272 (its offset is PAGE aligned), could it be the page cache has been truncated while the I/O is undergoing?

            People

              jay Jinshan Xiong (Inactive)
              di.wang Di Wang (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: