now some results for the mballoc pathes..
so with fs filled as above (debugfs) where basically we get very fragmented filesystem (20 free blocks followed by 80 busy blocks):
# time dd if=/dev/zero of=/mnt/huge/f11 bs=8k count=1
1+0 records in
1+0 records out
8192 bytes (8.2 kB, 8.0 KiB) copied, 0.521538 s, 15.7 kB/s
real 0m0.524s
and extra debugging from mballoc:
[ 5762.522831] AC: 1 orig, 512 goal, 20 best, 201 found @ 2 2174 pref [ 166911 90072 384 ]
i.e. mballoc requested 1 block, set 512 as goal, prefetched 2174 bitmaps, then found 201 extents and preallocated 20 blocks.
it took 166 uses to issue IO to prefetch those groups (from SSD) and skip all uninitialized groups at cr=0.
then it took 90 usec to skip all uninitialized groups at cr=1.
and then few cycles to scan one group and return something.
so that shouldn't get stuck at Lustre mount..
but I think this level of fragmentation exposes another problem very well. say, all groups have been initialized finally.
now we try to write 8MB:
# time dd if=/dev/zero of=/mnt/huge/f10 bs=8M count=1
1+0 records in
1+0 records out
8388608 bytes (8.4 MB, 8.0 MiB) copied, 11.4156 s, 735 kB/s
real 0m11.418s
notice it's 11s ..
[ 5541.664107] AC: 1 orig, 512 goal, 20 best, 201 found @ 2 0 pref [ 76235 73909 13 ]
[ 5541.814086] AC: 1 orig, 512 goal, 20 best, 201 found @ 2 0 pref [ 75747 73771 13 ]
[ 5541.964049] AC: 1 orig, 512 goal, 20 best, 201 found @ 2 0 pref [ 75727 73776 12 ]
[ 5542.114082] AC: 1 orig, 512 goal, 20 best, 201 found @ 2 0 pref [ 75681 73883 13 ]
[ 5542.269864] AC: 1 orig, 512 goal, 20 best, 201 found @ 2 0 pref [ 75796 79530 13 ]
[ 5542.420171] AC: 1 orig, 512 goal, 20 best, 201 found @ 2 0 pref [ 75875 73870 13 ]
i.e. it scans all groups at cr=0 and cr=1 taking ~75 usec each. and that repeats 256 times - 2048 blocks in 256 allocations, each finding 20 blocks as this is the largest chunk we can allocate from this filesystem.
that's a clear sign mballoc has to be able to remove groups from even checking based on some fragmentation criteria? like few lists containing groups.
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37155/
Subject:
LU-12988osd: do not use preallocation during mountProject: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 2331cd1fa178b348d8aa048abbb5160ac9353461