[LU-9375] llog files have less number of records than they designed Created: 20/Apr/17  Updated: 21/Apr/17

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.0
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Alexander Boyko Assignee: Bruno Faccini (Inactive)
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

The default bitmap size for llog file is 64768, so it can store 64767 records. llog_cat_add fills one plain llog file to full size then go to another plain llog file. Right now, I see that this logic was broken.
single node reproducer on top of master branch eb3379162c.

[root@localhost intelgerrit]# sh luste/tests/llmount.sh
[root@localhost intelgerrit]# lctl --device lustre-MDT0000 changelog_register
lustre-MDT0000: Registered changelog userid 'cl1'
[root@localhost intelgerrit]# mkdir /mnt/lustre/test
[root@localhost intelgerrit]# lustre/tests/createmany -o /mnt/lustre/test/foo- 64768
 - open/close 10000 (time 1492674148.41 total 5.24 last 1909.71)
 - open/close 20000 (time 1492674153.50 total 10.33 last 1963.04)
 - open/close 30000 (time 1492674158.39 total 15.22 last 2044.79)
 - open/close 40000 (time 1492674163.59 total 20.42 last 1923.54)
 - open/close 50000 (time 1492674168.46 total 25.29 last 2055.37)
 - open/close 60000 (time 1492674173.32 total 30.15 last 2055.01)
total: 64768 open/close in 32.60 seconds: 1986.96 ops/second
[root@localhost ~]# debugfs -R "dump changelog_catalog changelog_catalog" /tmp/lustre-mdt1 
debugfs 1.42.13.x3 (26-Dec-2016)

[root@localhost ~]# llog_reader changelog_catalog  | tail -n13
#01 (064)id=[0x7:0x1:0x0]:0 path=oi.1/0x1:0x7:0x0
#02 (064)id=[0x8:0x1:0x0]:0 path=oi.1/0x1:0x8:0x0
#03 (064)id=[0x9:0x1:0x0]:0 path=oi.1/0x1:0x9:0x0
#04 (064)id=[0xa:0x1:0x0]:0 path=oi.1/0x1:0xa:0x0
#05 (064)id=[0xb:0x1:0x0]:0 path=oi.1/0x1:0xb:0x0
#06 (064)id=[0xc:0x1:0x0]:0 path=oi.1/0x1:0xc:0x0
#07 (064)id=[0xd:0x1:0x0]:0 path=oi.1/0x1:0xd:0x0
#08 (064)id=[0xe:0x1:0x0]:0 path=oi.1/0x1:0xe:0x0
#09 (064)id=[0xf:0x1:0x0]:0 path=oi.1/0x1:0xf:0x0
#10 (064)id=[0x10:0x1:0x0]:0 path=oi.1/0x1:0x10:0x0
#11 (064)id=[0x11:0x1:0x0]:0 path=oi.1/0x1:0x11:0x0
#12 (064)id=[0x12:0x1:0x0]:0 path=oi.1/0x1:0x12:0x0
#13 (064)id=[0x13:0x1:0x0]:0 path=oi.1/0x1:0x13:0x0

[root@localhost ~]# debugfs -R "dump /O/1/d7/7 plain_1" /tmp/lustre-mdt1 
debugfs 1.42.13.x3 (26-Dec-2016)
[root@localhost ~]# llog_reader plain_1 | tail
#11994 (128)changelog record id:0x0 cr_flags:0x5000 cr_type:CREAT(0x1)
#11995 (120)changelog record id:0x0 cr_flags:0x5043 cr_type:CLOSE(0xb)
#11996 (128)changelog record id:0x0 cr_flags:0x5000 cr_type:CREAT(0x1)
#11997 (120)changelog record id:0x0 cr_flags:0x5043 cr_type:CLOSE(0xb)
#11998 (128)changelog record id:0x0 cr_flags:0x5000 cr_type:CREAT(0x1)
#11999 (120)changelog record id:0x0 cr_flags:0x5043 cr_type:CLOSE(0xb)
#12000 (128)changelog record id:0x0 cr_flags:0x5000 cr_type:CREAT(0x1)
#12001 (120)changelog record id:0x0 cr_flags:0x5043 cr_type:CLOSE(0xb)
#12002 (128)changelog record id:0x0 cr_flags:0x5000 cr_type:CREAT(0x1)
#12003 (120)changelog record id:0x0 cr_flags:0x5043 cr_type:CLOSE(0xb)

[root@localhost ~]# debugfs -R "dump /O/1/d8/8 plain_2" /tmp/lustre-mdt1 
debugfs 1.42.13.x3 (26-Dec-2016)
[root@localhost ~]# llog_reader plain_2 | tail
#11627 (136)changelog record id:0x0 cr_flags:0x5000 cr_type:CREAT(0x1)
#11628 (120)changelog record id:0x0 cr_flags:0x5043 cr_type:CLOSE(0xb)
#11629 (136)changelog record id:0x0 cr_flags:0x5000 cr_type:CREAT(0x1)
#11630 (120)changelog record id:0x0 cr_flags:0x5043 cr_type:CLOSE(0xb)
#11631 (136)changelog record id:0x0 cr_flags:0x5000 cr_type:CREAT(0x1)
#11632 (120)changelog record id:0x0 cr_flags:0x5043 cr_type:CLOSE(0xb)
#11633 (136)changelog record id:0x0 cr_flags:0x5000 cr_type:CREAT(0x1)
#11634 (120)changelog record id:0x0 cr_flags:0x5043 cr_type:CLOSE(0xb)
#11635 (136)changelog record id:0x0 cr_flags:0x5000 cr_type:CREAT(0x1)
#11636 (120)changelog record id:0x0 cr_flags:0x5043 cr_type:CLOSE(0xb)

[root@localhost ~]# debugfs -R "dump /O/1/d16/16 plain_10" /tmp/lustre-mdt1 
debugfs 1.42.13.x3 (26-Dec-2016)
[root@localhost ~]# llog_reader plain_10 | tail
#9529 (120)changelog record id:0x0 cr_flags:0x5043 cr_type:CLOSE(0xb)
#9530 (136)changelog record id:0x0 cr_flags:0x5000 cr_type:CREAT(0x1)
#9531 (120)changelog record id:0x0 cr_flags:0x5043 cr_type:CLOSE(0xb)
#9532 (136)changelog record id:0x0 cr_flags:0x5000 cr_type:CREAT(0x1)
#9533 (120)changelog record id:0x0 cr_flags:0x5043 cr_type:CLOSE(0xb)
#9534 (136)changelog record id:0x0 cr_flags:0x5000 cr_type:CREAT(0x1)
#9535 (120)changelog record id:0x0 cr_flags:0x5043 cr_type:CLOSE(0xb)
#9536 (136)changelog record id:0x0 cr_flags:0x5000 cr_type:CREAT(0x1)
#9537 (120)changelog record id:0x0 cr_flags:0x5043 cr_type:CLOSE(0xb)
#9538 (136)changelog record id:0x0 cr_flags:0x5000 cr_type:CREAT(0x1)

So every plain llog file stores about ~11k records instead of 64k.



 Comments   
Comment by Bruno Faccini (Inactive) [ 20/Apr/17 ]

Well this seems to be caused by this piece of code in llog_osd_write_rec() :

 377 static int llog_osd_write_rec(const struct lu_env *env,
 378                               struct llog_handle *loghandle,
 379                               struct llog_rec_hdr *rec,
 380                               struct llog_cookie *reccookie,
 381                               int idx, struct thandle *th)
 382 {    
.................
 559         if (loghandle->lgh_max_size > 0 &&
 560             lgi->lgi_off >= loghandle->lgh_max_size) {
 561                 CDEBUG(D_OTHER, "llog is getting too large (%u > %u) at %u "
 562                        DOSTID"\n", (unsigned)lgi->lgi_off,
 563                        loghandle->lgh_max_size,
 564                        (int)loghandle->lgh_last_idx,
 565                        POSTID(&loghandle->lgh_id.lgl_oi));
 566                 /* this is to signal that this llog is full */
 567                 loghandle->lgh_last_idx = LLOG_HDR_BITMAP_SIZE(llh) - 1;
 568                 RETURN(-ENOSPC);
 569         }
................

this new limitation code comes from this patch :

commit 4724b52bba54ccdb0f81d0c63010b69e87e7f65c
Author: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Date:   Mon Jan 18 09:24:19 2016 +0300

    LU-6838 llog: limit file size of plain logs
    
    on small filesystems plain log can grow dramatically. especially
    given large record sizes produced by DNE and extended chunksize.
    I saw >50% of space consumed by a single llog file which was still
    in use. this leads to test failures (sanityn, etc).
    the patch introduces additional limit on plain llog size, which
    is calculated as <free space>/64 (128MB at most) at llog creation
    time.
    
    Change-Id: I0eab8177d4e416a32a6aab56d47e4142c81d13de
    Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
    Reviewed-on: https://review.whamcloud.com/18028
    Tested-by: Jenkins
    Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
    Tested-by: Maloo <hpdd-maloo@intel.com>
    Reviewed-by: wangdi <di.wang@intel.com>
    Reviewed-by: Mike Pershin <mike.pershin@intel.com>
    Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

And thus, i believe you get so small plain LLOGs due to the very little size of your MDT device created by llmount.sh, the number of entries varying due to the different/variable sizes of the different ChangeLog records being recorded.

Comment by Alexander Boyko [ 21/Apr/17 ]

Bruno Faccini thanks for clarification, you are right. But commit msg doesn`t fit with code calculation. It seems that we need 8,2GB free space to skip 2MB limit of plain log.

the patch introduces additional limit on plain llog size, which is calculated as <free space>/64 (128MB at most) at llog creation time.

        loghandle->lgh_max_size = 2 << 20;
        dt = lu2dt_dev(cathandle->lgh_obj->do_lu.lo_dev);                             
        rc = dt_statfs(env, dt, &lgi->lgi_statfs);                                    
        if (rc == 0 && lgi->lgi_statfs.os_bfree > 0) {
                __u64 freespace = (lgi->lgi_statfs.os_bfree *                         
                                  lgi->lgi_statfs.os_bsize) >> 6;                     
                if (freespace < loghandle->lgh_max_size)                              
                        loghandle->lgh_max_size = freespace;                          
                /* shouldn't be > 128MB in any case?                                  
                 * it's 256K records of 512 bytes each */                             
                if (freespace > (128 << 20))
                        loghandle->lgh_max_size = 128 << 20;
        } 


Generated at Sat Feb 10 02:25:38 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.