Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4382

kernel BUG at fs/jbd2/transaction.c:1033

Details

    • 3
    • 12011

    Description

      Running lustre 2.4.0-19chaos we recently had several OSS nodes crass on the following BUG:

      kernel BUG at fs/jbd2/transaction.c:1033

      They were all in an ll_ost_00_0?? thread with the same backtrace:

      __ldiskfs_handle_dirty_metadata
      ldiskfs_free_inode
      ldiskfs_delete_inode
      generic_delete_inode
      generic_drop_inode
      iput
      osd_object_delete
      lu_object_free
      lu_object_put
      ofd_object_put
      ofd_destroy_by_fid
      ofd_destroy
      ofd_destroy
      ost_handle
      ptlrpc_server_handle_request
      ptlrpc_main
      

      The failures were on the secure network, so I can't upload logs.

      Attachments

        Issue Links

          Activity

            [LU-4382] kernel BUG at fs/jbd2/transaction.c:1033
            pjones Peter Jones added a comment -

            Landed for 2.6

            pjones Peter Jones added a comment - Landed for 2.6
            bogl Bob Glossman (Inactive) added a comment - backport to b2_5: http://review.whamcloud.com/9334
            bobijam Zhenyu Xu added a comment -

            Since OST does not use xattr inode, I've created another ticket for it (LU-4648).

            bobijam Zhenyu Xu added a comment - Since OST does not use xattr inode, I've created another ticket for it ( LU-4648 ).

            > And also, if the 1st reservation has been almost precise with this evaluation but more credits are required in
            ext4_delete_inode(), why do we need to extend with the same number ?

            because in some cases truncate can't fit a single transaction (this is true for other filesystems like ZFS as well), then it's hard to compute credits for truncate as it involves tree traversal (so we'd have to traverse twice).

            one way to "compute" is to extend handle_t with another counter and increment it in __ldiskfs_handle_dirty_metadata() or maintain some history. this way you can learn what code is involved, at least.

            bzzz Alex Zhuravlev added a comment - > And also, if the 1st reservation has been almost precise with this evaluation but more credits are required in ext4_delete_inode(), why do we need to extend with the same number ? because in some cases truncate can't fit a single transaction (this is true for other filesystems like ZFS as well), then it's hard to compute credits for truncate as it involves tree traversal (so we'd have to traverse twice). one way to "compute" is to extend handle_t with another counter and increment it in __ldiskfs_handle_dirty_metadata() or maintain some history. this way you can learn what code is involved, at least.
            bfaccini Bruno Faccini (Inactive) added a comment - - edited
            I find this global credits need evaluation very difficult to compute, regarding the number of nested handles/needs in underlying
            routines called and also depending on the current operation to be processed …
            So, I can understand why the 2*EXT4_QUOTA_INIT_BLOCK() credits need in vfs_dq_init()>ext4_dquot_initialize()->dquot_initialize()
            could be ignored since it will not be used during a truncate, but then why do we also ignore the
            MAXQUOTAS*EXT4_QUOTA_TRANS_BLOCKS() (2*2 ?) for 
            vfs_dq_free_inode()->mark_dquot_dirty()->ext4_write_dquot()->dquot_commit() ?
            
            And also, if the 1st reservation has been almost precise with this evaluation but more credits are required in 
            ext4_delete_inode(), why do we need to extend with the same number ?
            
            bfaccini Bruno Faccini (Inactive) added a comment - - edited I find this global credits need evaluation very difficult to compute, regarding the number of nested handles/needs in underlying routines called and also depending on the current operation to be processed … So, I can understand why the 2*EXT4_QUOTA_INIT_BLOCK() credits need in vfs_dq_init()>ext4_dquot_initialize()->dquot_initialize() could be ignored since it will not be used during a truncate, but then why do we also ignore the MAXQUOTAS*EXT4_QUOTA_TRANS_BLOCKS() (2*2 ?) for vfs_dq_free_inode()->mark_dquot_dirty()->ext4_write_dquot()->dquot_commit() ? And also, if the 1st reservation has been almost precise with this evaluation but more credits are required in ext4_delete_inode(), why do we need to extend with the same number ?
            bobijam Zhenyu Xu added a comment - - edited

            yes, I think you are right about the unnecessariness of the dquot initial credit counting.

            And further more, I checked dquot_drop()->dqput(), if the dquot drop need to release the quota structure, its operations also involves writing the dquot back to disk, so probably 2*EXT4_QUOTA_DEL_BLOCKS is enough for the whole quota operations.

            bobijam Zhenyu Xu added a comment - - edited yes, I think you are right about the unnecessariness of the dquot initial credit counting. And further more, I checked dquot_drop()->dqput(), if the dquot drop need to release the quota structure, its operations also involves writing the dquot back to disk, so probably 2*EXT4_QUOTA_DEL_BLOCKS is enough for the whole quota operations.

            I don't think ext4_free_inode() needs to account for EXT4_QUOTA_INIT_BLOCK() because the user and group should already have quota allocated at wrote or chown time. I don't think it is possible to delete a file that has not already accounted in the quota.

            adilger Andreas Dilger added a comment - I don't think ext4_free_inode() needs to account for EXT4_QUOTA_INIT_BLOCK() because the user and group should already have quota allocated at wrote or chown time. I don't think it is possible to delete a file that has not already accounted in the quota.
            bobijam Zhenyu Xu added a comment -

            the patch to verify the quota missed credit theory http://review.whamcloud.com/9187

            bobijam Zhenyu Xu added a comment - the patch to verify the quota missed credit theory http://review.whamcloud.com/9187
            bobijam Zhenyu Xu added a comment -

            ext4_delete_inode()-> ext4_free_inode(handle, inode)

                    vfs_dq_init(inode);
                    ext4_xattr_delete_inode(handle, inode);
                    vfs_dq_free_inode(inode);
                    vfs_dq_drop(inode);
            

            in vfs_dq_init()->ext4_dquot_initialize() (which is added in ldiskfs ext4-back-dquot-to patch)

            ext4_dquot_initialize
            static int ext4_dquot_initialize(struct inode *inode, int type)
            {       
                    handle_t *handle;
                    int ret, err;
                    
                    /* We may create quota structure so we need to reserve enough blocks */
                    handle = ext4_journal_start(inode, 2*EXT4_QUOTA_INIT_BLOCKS(inode->i_sb));                     
                    if (IS_ERR(handle))
                            return PTR_ERR(handle);
                    ret = dquot_initialize(inode, type);
                    err = ext4_journal_stop(handle);
                    if (!ret)      
                            ret = err;
                    return ret;     
            }       
            

            in vfs_dq_free_inode()

                    /* Dirtify all the dquots - this can block when journalling */
                    for (cnt = 0; cnt < MAXQUOTAS; cnt++)
                            if (dquot[cnt])
                                    mark_dquot_dirty(dquot[cnt]);
            

            and mark_dquot_dirty()->ext4_write_dquot()

            ext4_write_dquot
            static int ext4_write_dquot(struct dquot *dquot)
            {
                    int ret, err;
                    handle_t *handle;
                    struct inode *inode;
            
                    inode = dquot_to_inode(dquot);
                    handle = ext4_journal_start(inode,
                                                EXT4_QUOTA_TRANS_BLOCKS(dquot->dq_sb));
                    if (IS_ERR(handle))
                            return PTR_ERR(handle);
                    ret = dquot_commit(dquot);
                    err = ext4_journal_stop(handle);
                    if (!ret)
                            ret = err;
                    return ret;
            }
            

            and vfs_dq_drop()->ext4_dquot_drop()

            ext4_dquot_drop
                    handle_t *handle;
                    int ret, err;
                    
                    /* We may delete quota structure so we need to reserve enough blocks */
                    handle = ext4_journal_start(inode, 2*EXT4_QUOTA_DEL_BLOCKS(inode->i_sb));       
                    if (IS_ERR(handle)) {
                            /*
                             * We call dquot_drop() anyway to at least release references
                             * to quota structures so that umount does not hang.
                             */
                            dquot_drop(inode);
                            return PTR_ERR(handle);
                    }
                    ret = dquot_drop(inode);
                    err = ext4_journal_stop(handle);
                    if (!ret)
                            ret = err;
                    return ret;
            

            so for quota part, ext4_free_inode may misses 2*EXT4_QUOTA_INIT_BLOCKS(inode->i_sb) + 2*2 + 2*EXT4_QUOTA_DEL_BLOCKS(inode->i_sb) credits.

            bobijam Zhenyu Xu added a comment - ext4_delete_inode()-> ext4_free_inode(handle, inode) vfs_dq_init(inode); ext4_xattr_delete_inode(handle, inode); vfs_dq_free_inode(inode); vfs_dq_drop(inode); in vfs_dq_init()->ext4_dquot_initialize() (which is added in ldiskfs ext4-back-dquot-to patch) ext4_dquot_initialize static int ext4_dquot_initialize(struct inode *inode, int type) { handle_t *handle; int ret, err; /* We may create quota structure so we need to reserve enough blocks */ handle = ext4_journal_start(inode, 2*EXT4_QUOTA_INIT_BLOCKS(inode->i_sb)); if (IS_ERR(handle)) return PTR_ERR(handle); ret = dquot_initialize(inode, type); err = ext4_journal_stop(handle); if (!ret) ret = err; return ret; } in vfs_dq_free_inode() /* Dirtify all the dquots - this can block when journalling */ for (cnt = 0; cnt < MAXQUOTAS; cnt++) if (dquot[cnt]) mark_dquot_dirty(dquot[cnt]); and mark_dquot_dirty()->ext4_write_dquot() ext4_write_dquot static int ext4_write_dquot(struct dquot *dquot) { int ret, err; handle_t *handle; struct inode *inode; inode = dquot_to_inode(dquot); handle = ext4_journal_start(inode, EXT4_QUOTA_TRANS_BLOCKS(dquot->dq_sb)); if (IS_ERR(handle)) return PTR_ERR(handle); ret = dquot_commit(dquot); err = ext4_journal_stop(handle); if (!ret) ret = err; return ret; } and vfs_dq_drop()->ext4_dquot_drop() ext4_dquot_drop handle_t *handle; int ret, err; /* We may delete quota structure so we need to reserve enough blocks */ handle = ext4_journal_start(inode, 2*EXT4_QUOTA_DEL_BLOCKS(inode->i_sb)); if (IS_ERR(handle)) { /* * We call dquot_drop() anyway to at least release references * to quota structures so that umount does not hang. */ dquot_drop(inode); return PTR_ERR(handle); } ret = dquot_drop(inode); err = ext4_journal_stop(handle); if (!ret) ret = err; return ret; so for quota part, ext4_free_inode may misses 2*EXT4_QUOTA_INIT_BLOCKS(inode->i_sb) + 2*2 + 2*EXT4_QUOTA_DEL_BLOCKS(inode->i_sb) credits.
            green Oleg Drokin added a comment -

            It would be a great exercise for somebody other than Alex to go through this callpath and count all possible blocks being journaled, list them all here for double verification and thus arrive to the updated reservation number.

            green Oleg Drokin added a comment - It would be a great exercise for somebody other than Alex to go through this callpath and count all possible blocks being journaled, list them all here for double verification and thus arrive to the updated reservation number.

            People

              bobijam Zhenyu Xu
              morrone Christopher Morrone (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: