Details
-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
An fallocate operation on large disk devices may be very slow.
Here is an example, I created a 100T sparse file and formatted it as ldiskfs fs, then mounted as ldiskfs:
[root@rocky9 wc-master]# df -h /mnt/ldiskfs/ Filesystem Size Used Avail Use% Mounted on /dev/vdb 100T 8.0T 87T 9% /mnt/ldiskfs [root@rocky9 wc-master]#
the fs was fragmented special way, i.e. each block group has about 8% used and 92% of free space:
time for x in {0..819200}; do fallocate -o $((x * 10))M -l 10M /mnt/ldiskfs/filler.ldiskfs ; echo $x > /proc/fs/ldiskfs/vdb/mb_last_group ; (( $x % 10000 == 0 )) && { echo -n $x -- ; date; }; done
after that, an attempt to fallocate a 100G file takes more than 1 min:
[root@rocky9 wc-master]# rmmod ldiskfs [root@rocky9 wc-master]# insmod ldiskfs/ldiskfs.ko [root@rocky9 wc-master]# mount -t ldiskfs /dev/vdb /mnt/ldiskfs/ [root@rocky9 wc-master]# echo 1 > /sys/fs/ldiskfs/vdb/mb_stats [root@rocky9 wc-master]# rm -f /mnt/ldiskfs/big [root@rocky9 wc-master]# time fallocate -o 0 -l 100G /mnt/ldiskfs/big real 1m24.296s user 0m0.001s sys 1m24.142s [root@rocky9 wc-master]#
here is the mb_stats for the operation:
[root@rocky9 wc-master]# cat /proc/fs/ldiskfs/vdb/mb_stats mballoc: reqs: 875 success: 4 groups_scanned: 175074 cr0_stats: hits: 0 groups_considered: 0 useless_loops: 0 bad_suggestions: 0 cr1_stats: hits: 4 groups_considered: 713509254 useless_loops: 0 bad_suggestions: 0 cr2_stats: hits: 0 groups_considered: 713523200 useless_loops: 871 cr3_stats: hits: 871 groups_considered: 176811 useless_loops: 0 extents_scanned: 175096 goal_hits: 0 2^n_hits: 0 breaks: 871 lost: 0 buddies_generated: 820270/819200 buddies_time_used: 1904314776 preallocated: 0 discarded: 0 [root@rocky9 wc-master]#
the mb alloc stats shows that c2 loop failed 800+ times, i.e. it failed for each allocation request. i.e. there were 800+ useless loops across all block groups in the fs.
( 800+ is from max block allocation request which about 32k blocks (128MB) and the total fallocated space of 100GB ~= 128MB x 800 ).
the statistics for cr1 is also not good:
cr1_stats: hits: 4 groups_considered: 713509254 useless_loops: 0 bad_suggestions: 0
"groups considered=713509254" means the allocator also tried all block groups for each of 800+ requests, but just didn't count those attempts as "useless loops".
My test fs has 819200, real systems may have ~5M+ groups, meaning the allocation requests would take 6-8 min or more.
it is rather a generic problem with ldiskfs block allocator, fallocate just makes it clearly visible because unlike writes , fallocate requests are not limited by BRW size.
the problem existed in RHEL8.x and the new improved block allocator from RHEL9.x / (linux-5.14) made it no better.