[LU-6838] update llog become too big before it is destroyed Created: 11/Jul/15  Updated: 17/Dec/16  Resolved: 17/Dec/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Lustre 2.10.0

Type: Bug Priority: Major
Reporter: Di Wang Assignee: Di Wang
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-6831 The ticket for tracking all DNE2 bugs Reopened
is related to LU-8794 update_log_dir consuming 1.1TB on MDT... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Update llog might be too big before it is destroyed

So with current llog destroy implementation.

/* returns negative on error; 0 if success; 1 if success & log destroyed */
int llog_cancel_rec(const struct lu_env *env, struct llog_handle *loghandle,
                    int index)
{
............
        if ((llh->llh_flags & LLOG_F_ZAP_WHEN_EMPTY) &&
            (llh->llh_count == 1) &&
            (loghandle->lgh_last_idx == LLOG_HDR_BITMAP_SIZE(llh) - 1)) {
                rc = llog_destroy(env, loghandle);
                if (rc < 0) {
                        /* Sigh, can not destroy the final plain llog, but
                         * the bitmap has been clearly, so the record can not
                         * be accessed anymore, let's return 0 for now, and
                         * the orphan will be handled by LFSCK. */
                        CERROR("%s: can't destroy empty llog #"DOSTID
                               "#%08x: rc = %d\n",
                               loghandle->lgh_ctxt->loc_obd->obd_name,
                               POSTID(&loghandle->lgh_id.lgl_oi),
                               loghandle->lgh_id.lgl_ogen, rc);
                        RETURN(0);
                }
                RETURN(LLOG_DEL_PLAIN);
        }      
}

So with llog chunk size = 32K, the LLOG_HDR_BITMAP_SIZE() will be It 261375, so the llog update record object will be destroyed until all of its bitmap is "FULL", the size will be about (each record will be about 3K, even more with bigger stripe_count)

261375 * 3k = 760M.

So we probably should destroy the llog object earlier, instead of waiting all of bits are being filled.



 Comments   
Comment by Andreas Dilger [ 12/Jul/15 ]

This doesn't explain why it is bad that the llog file is large?

Comment by Di Wang [ 12/Jul/15 ]

Well the update log might cost a lot MDT space. And these update records will not be really deleted until all of bitmaps has been "used" according to current llog implementation.

1. the space will be used up by update log in a small size MDT.
2. the user might complain see several GB of MDT space usage, even there are no dirs/files.

Comment by Gerrit Updater [ 18/Jan/16 ]

Alex Zhuravlev (alexey.zhuravlev@intel.com) uploaded a new patch: http://review.whamcloud.com/18028
Subject: LU-6838 llog: limit file size
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: d050cac02a90e8f18f079554339fe593ec2319d9

Comment by Gerrit Updater [ 17/Dec/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/18028/
Subject: LU-6838 llog: limit file size of plain logs
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 4724b52bba54ccdb0f81d0c63010b69e87e7f65c

Comment by Peter Jones [ 17/Dec/16 ]

Landed for 2.10

Generated at Sat Feb 10 02:03:41 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.