Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15319

Weird mballoc behaviour

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • Lustre 2.16.0
    • None
    • None
    • 9223372036854775807

    Description

      A weird mballoc behavior in sudden STREAM_ALLOC allocator head jump after a target mount:

      # grep -H "" /proc/fs/ldiskfs/md*/mb_last_group
      /proc/fs/ldiskfs/md0/mb_last_group:0
      /proc/fs/ldiskfs/md2/mb_last_group:0
      # echo > /sys/kernel/debug/tracing/trace
      # nobjlo=2 nobjhi=2 thrlo=1024 thrhi=1024 size=393216 rszlo=4096 rszhi=4096 tests_str="write" obdfilter-survey 2>&1 | tee /root/obdfilter-survey.log
      Fri Dec  3 12:25:19 UTC 2021 Obdfilter-survey for case=disk from kjlmo1304
      ost  2 sz 805306368K rsz 4096K obj    4 thr 2048 write 16552.35 [4580.64, 9382.91] 
      /usr/bin/iokit-libecho: line 236: 253095 Killed                  remote_shell $host "vmstat 5 >> $host_vmstatf" &>/dev/null
      done!
      # grep -H "" /proc/fs/ldiskfs/md*/mb_last_group
      /proc/fs/ldiskfs/md0/mb_last_group:114337
      /proc/fs/ldiskfs/md2/mb_last_group:130831
      #
      

      The streaming allocator head jumped right to the first non-initialized group and now it is the last inited group (the target fs is almost empty):

      [root@kjlmo1304 ~]# dumpe2fs /dev/md0 | sed '/BLOCK/q' | tail -24
      ....
      Group 114335: (Blocks 3746529280-3746562047) csum 0x1b7a [INODE_UNINIT, ITABLE_ZEROED]
        Block bitmap at 3741319328 (bg #114176 + 160)
        Inode bitmap at 3741319584 (bg #114176 + 416)
        Inode table at 3741322225-3741322240 (bg #114176 + 3057)
        32768 free blocks, 128 free inodes, 0 directories, 128 unused inodes
        Free blocks: 3746529280-3746562047
        Free inodes: 14634881-14635008
      Group 114336: (Blocks 3746562048-3746594815) csum 0x37c1 [INODE_UNINIT, ITABLE_ZEROED]
        Block bitmap at 3741319329 (bg #114176 + 161)
        Inode bitmap at 3741319585 (bg #114176 + 417)
        Inode table at 3741322241-3741322256 (bg #114176 + 3073)
        32768 free blocks, 128 free inodes, 0 directories, 128 unused inodes
        Free blocks: 3746562048-3746594815
        Free inodes: 14635009-14635136
      Group 114337: (Blocks 3746594816-3746627583) csum 0xbacd [INODE_UNINIT, ITABLE_ZEROED]
        Block bitmap at 3741319330 (bg #114176 + 162)
        Inode bitmap at 3741319586 (bg #114176 + 418)
        Inode table at 3741322257-3741322272 (bg #114176 + 3089)
        32768 free blocks, 128 free inodes, 0 directories, 128 unused inodes
        Free blocks: 3746594816-3746627583
        Free inodes: 14635137-14635264
      Group 114338: (Blocks 3746627584-3746660351) csum 0xca57 [INODE_UNINIT, BLOCK_UNINIT, ITABLE_ZEROED]
      

      The above jump is not big enough to cause performance impact, but the same behavior was observed on another system with 2M block group initialized, that mb_last_group jump shifted block allocations on an empty fs over the middle of the disk device with approximately 15% write / read slowdown.

      Looks like it was due to the following checks in ldiksfs_mb_good_group()

              /* We only do this if the grp has never been initialized */
              if (unlikely(LDISKFS_MB_GRP_NEED_INIT(grp))) {
                      int ret;
      
                      /* cr=0/1 is a very optimistic search to find large
                       * good chunks almost for free. if buddy data is
                       * not ready, then this optimization makes no sense */
      
                      if (cr < 2 && !ldiskfs_mb_uninit_on_disk(ac->ac_sb, group))
                              return 0;
                      ret = ldiskfs_mb_init_group(ac->ac_sb, group);
                      if (ret)
                              return 0;
              }
      
      

      introduced by

      ecb68b8 LU-13291 ldiskfs: mballoc don't skip uninit-on-disk groups
      6a7a700 LU-12988 ldiskfs: skip non-loaded groups at cr=0/1 
      

      Attachments

        Issue Links

          Activity

            People

              bzzz Alex Zhuravlev
              zam Alexander Zarochentsev
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: