[LU-15319] Weird mballoc behaviour Created: 06/Dec/21 Updated: 25/Sep/23 Resolved: 25/Sep/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Alexander Zarochentsev | Assignee: | Alex Zhuravlev |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
A weird mballoc behavior in sudden STREAM_ALLOC allocator head jump after a target mount: # grep -H "" /proc/fs/ldiskfs/md*/mb_last_group /proc/fs/ldiskfs/md0/mb_last_group:0 /proc/fs/ldiskfs/md2/mb_last_group:0 # echo > /sys/kernel/debug/tracing/trace # nobjlo=2 nobjhi=2 thrlo=1024 thrhi=1024 size=393216 rszlo=4096 rszhi=4096 tests_str="write" obdfilter-survey 2>&1 | tee /root/obdfilter-survey.log Fri Dec 3 12:25:19 UTC 2021 Obdfilter-survey for case=disk from kjlmo1304 ost 2 sz 805306368K rsz 4096K obj 4 thr 2048 write 16552.35 [4580.64, 9382.91] /usr/bin/iokit-libecho: line 236: 253095 Killed remote_shell $host "vmstat 5 >> $host_vmstatf" &>/dev/null done! # grep -H "" /proc/fs/ldiskfs/md*/mb_last_group /proc/fs/ldiskfs/md0/mb_last_group:114337 /proc/fs/ldiskfs/md2/mb_last_group:130831 # The streaming allocator head jumped right to the first non-initialized group and now it is the last inited group (the target fs is almost empty): [root@kjlmo1304 ~]# dumpe2fs /dev/md0 | sed '/BLOCK/q' | tail -24 .... Group 114335: (Blocks 3746529280-3746562047) csum 0x1b7a [INODE_UNINIT, ITABLE_ZEROED] Block bitmap at 3741319328 (bg #114176 + 160) Inode bitmap at 3741319584 (bg #114176 + 416) Inode table at 3741322225-3741322240 (bg #114176 + 3057) 32768 free blocks, 128 free inodes, 0 directories, 128 unused inodes Free blocks: 3746529280-3746562047 Free inodes: 14634881-14635008 Group 114336: (Blocks 3746562048-3746594815) csum 0x37c1 [INODE_UNINIT, ITABLE_ZEROED] Block bitmap at 3741319329 (bg #114176 + 161) Inode bitmap at 3741319585 (bg #114176 + 417) Inode table at 3741322241-3741322256 (bg #114176 + 3073) 32768 free blocks, 128 free inodes, 0 directories, 128 unused inodes Free blocks: 3746562048-3746594815 Free inodes: 14635009-14635136 Group 114337: (Blocks 3746594816-3746627583) csum 0xbacd [INODE_UNINIT, ITABLE_ZEROED] Block bitmap at 3741319330 (bg #114176 + 162) Inode bitmap at 3741319586 (bg #114176 + 418) Inode table at 3741322257-3741322272 (bg #114176 + 3089) 32768 free blocks, 128 free inodes, 0 directories, 128 unused inodes Free blocks: 3746594816-3746627583 Free inodes: 14635137-14635264 Group 114338: (Blocks 3746627584-3746660351) csum 0xca57 [INODE_UNINIT, BLOCK_UNINIT, ITABLE_ZEROED] The above jump is not big enough to cause performance impact, but the same behavior was observed on another system with 2M block group initialized, that mb_last_group jump shifted block allocations on an empty fs over the middle of the disk device with approximately 15% write / read slowdown. Looks like it was due to the following checks in ldiksfs_mb_good_group()
/* We only do this if the grp has never been initialized */
if (unlikely(LDISKFS_MB_GRP_NEED_INIT(grp))) {
int ret;
/* cr=0/1 is a very optimistic search to find large
* good chunks almost for free. if buddy data is
* not ready, then this optimization makes no sense */
if (cr < 2 && !ldiskfs_mb_uninit_on_disk(ac->ac_sb, group))
return 0;
ret = ldiskfs_mb_init_group(ac->ac_sb, group);
if (ret)
return 0;
}
introduced by ecb68b8 LU-13291 ldiskfs: mballoc don't skip uninit-on-disk groups 6a7a700 LU-12988 ldiskfs: skip non-loaded groups at cr=0/1 |
| Comments |
| Comment by Andreas Dilger [ 10/May/23 ] |
|
I suspect that this issue could be resolved with the new mballoc allocator from upstream kernels. |
| Comment by Andreas Dilger [ 25/Sep/23 ] |
|
The mballoc array-based group selection is almost ready to land in LU-14438 and I think that any development in that area should first start with backporting the next set of mballoc patches from upstream ext4, which address most of these issues. |