Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
None
-
None
-
9223372036854775807
Description
A weird mballoc behavior in sudden STREAM_ALLOC allocator head jump after a target mount:
# grep -H "" /proc/fs/ldiskfs/md*/mb_last_group /proc/fs/ldiskfs/md0/mb_last_group:0 /proc/fs/ldiskfs/md2/mb_last_group:0 # echo > /sys/kernel/debug/tracing/trace # nobjlo=2 nobjhi=2 thrlo=1024 thrhi=1024 size=393216 rszlo=4096 rszhi=4096 tests_str="write" obdfilter-survey 2>&1 | tee /root/obdfilter-survey.log Fri Dec 3 12:25:19 UTC 2021 Obdfilter-survey for case=disk from kjlmo1304 ost 2 sz 805306368K rsz 4096K obj 4 thr 2048 write 16552.35 [4580.64, 9382.91] /usr/bin/iokit-libecho: line 236: 253095 Killed remote_shell $host "vmstat 5 >> $host_vmstatf" &>/dev/null done! # grep -H "" /proc/fs/ldiskfs/md*/mb_last_group /proc/fs/ldiskfs/md0/mb_last_group:114337 /proc/fs/ldiskfs/md2/mb_last_group:130831 #
The streaming allocator head jumped right to the first non-initialized group and now it is the last inited group (the target fs is almost empty):
[root@kjlmo1304 ~]# dumpe2fs /dev/md0 | sed '/BLOCK/q' | tail -24 .... Group 114335: (Blocks 3746529280-3746562047) csum 0x1b7a [INODE_UNINIT, ITABLE_ZEROED] Block bitmap at 3741319328 (bg #114176 + 160) Inode bitmap at 3741319584 (bg #114176 + 416) Inode table at 3741322225-3741322240 (bg #114176 + 3057) 32768 free blocks, 128 free inodes, 0 directories, 128 unused inodes Free blocks: 3746529280-3746562047 Free inodes: 14634881-14635008 Group 114336: (Blocks 3746562048-3746594815) csum 0x37c1 [INODE_UNINIT, ITABLE_ZEROED] Block bitmap at 3741319329 (bg #114176 + 161) Inode bitmap at 3741319585 (bg #114176 + 417) Inode table at 3741322241-3741322256 (bg #114176 + 3073) 32768 free blocks, 128 free inodes, 0 directories, 128 unused inodes Free blocks: 3746562048-3746594815 Free inodes: 14635009-14635136 Group 114337: (Blocks 3746594816-3746627583) csum 0xbacd [INODE_UNINIT, ITABLE_ZEROED] Block bitmap at 3741319330 (bg #114176 + 162) Inode bitmap at 3741319586 (bg #114176 + 418) Inode table at 3741322257-3741322272 (bg #114176 + 3089) 32768 free blocks, 128 free inodes, 0 directories, 128 unused inodes Free blocks: 3746594816-3746627583 Free inodes: 14635137-14635264 Group 114338: (Blocks 3746627584-3746660351) csum 0xca57 [INODE_UNINIT, BLOCK_UNINIT, ITABLE_ZEROED]
The above jump is not big enough to cause performance impact, but the same behavior was observed on another system with 2M block group initialized, that mb_last_group jump shifted block allocations on an empty fs over the middle of the disk device with approximately 15% write / read slowdown.
Looks like it was due to the following checks in ldiksfs_mb_good_group()
/* We only do this if the grp has never been initialized */ if (unlikely(LDISKFS_MB_GRP_NEED_INIT(grp))) { int ret; /* cr=0/1 is a very optimistic search to find large * good chunks almost for free. if buddy data is * not ready, then this optimization makes no sense */ if (cr < 2 && !ldiskfs_mb_uninit_on_disk(ac->ac_sb, group)) return 0; ret = ldiskfs_mb_init_group(ac->ac_sb, group); if (ret) return 0; }
introduced by
ecb68b8 LU-13291 ldiskfs: mballoc don't skip uninit-on-disk groups 6a7a700 LU-12988 ldiskfs: skip non-loaded groups at cr=0/1