[LU-19023] slow fallocate on large ldiskfs devices - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: None
Labels:
None

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

An fallocate operation on large disk devices may be very slow.
Here is an example, I created a 100T sparse file and formatted it as ldiskfs fs, then mounted as ldiskfs:

[root@rocky9 wc-master]# df -h /mnt/ldiskfs/
Filesystem      Size  Used Avail Use% Mounted on
/dev/vdb        100T  8.0T   87T   9% /mnt/ldiskfs
[root@rocky9 wc-master]#

the fs was fragmented special way, i.e. each block group has about 8% used and 92% of free space:

time for x in {0..819200}; do fallocate -o $((x * 10))M -l 10M /mnt/ldiskfs/filler.ldiskfs ; echo $x > /proc/fs/ldiskfs/vdb/mb_last_group ; (( $x % 10000 == 0 )) && { echo -n $x -- ; date; };   done

after that, an attempt to fallocate a 100G file takes more than 1 min:

[root@rocky9 wc-master]# rmmod ldiskfs
[root@rocky9 wc-master]# insmod ldiskfs/ldiskfs.ko 
[root@rocky9 wc-master]# mount -t ldiskfs /dev/vdb /mnt/ldiskfs/
[root@rocky9 wc-master]# echo 1 > /sys/fs/ldiskfs/vdb/mb_stats 
[root@rocky9 wc-master]# rm -f /mnt/ldiskfs/big 
[root@rocky9 wc-master]# time fallocate -o 0  -l 100G /mnt/ldiskfs/big 

real	1m24.296s
user	0m0.001s
sys	1m24.142s
[root@rocky9 wc-master]#

here is the mb_stats for the operation:

[root@rocky9 wc-master]# cat /proc/fs/ldiskfs/vdb/mb_stats 
mballoc:
	reqs: 875
	success: 4
	groups_scanned: 175074
	cr0_stats:
		hits: 0
		groups_considered: 0
		useless_loops: 0
		bad_suggestions: 0
	cr1_stats:
		hits: 4
		groups_considered: 713509254
		useless_loops: 0
		bad_suggestions: 0
	cr2_stats:
		hits: 0
		groups_considered: 713523200
		useless_loops: 871
	cr3_stats:
		hits: 871
		groups_considered: 176811
		useless_loops: 0
	extents_scanned: 175096
		goal_hits: 0
		2^n_hits: 0
		breaks: 871
		lost: 0
	buddies_generated: 820270/819200
	buddies_time_used: 1904314776
	preallocated: 0
	discarded: 0
[root@rocky9 wc-master]#

the mb alloc stats shows that c2 loop failed 800+ times, i.e. it failed for each allocation request. i.e. there were 800+ useless loops across all block groups in the fs.

( 800+ is from max block allocation request which about 32k blocks (128MB) and the total fallocated space of 100GB ~= 128MB x 800 ).

the statistics for cr1 is also not good:

	cr1_stats:
		hits: 4
		groups_considered: 713509254
		useless_loops: 0
		bad_suggestions: 0

"groups considered=713509254" means the allocator also tried all block groups for each of 800+ requests, but just didn't count those attempts as "useless loops".

My test fs has 819200, real systems may have ~5M+ groups, meaning the allocation requests would take 6-8 min or more.

it is rather a generic problem with ldiskfs block allocator, fallocate just makes it clearly visible because unlike writes , fallocate requests are not limited by BRW size.

the problem existed in RHEL8.x and the new improved block allocator from RHEL9.x / (linux-5.14) made it no better.

Attachments

Activity

People

Assignee:: Alexander Zarochentsev

Reporter:: Alexander Zarochentsev

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 15/May/25 6:12 PM

Updated:: 23/May/25 6:28 PM