Details
-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
Lustre 2.14.0
-
None
-
master
-
3
-
9223372036854775807
Description
Here is a tested workload.
4k, random write, FPP(File per process)
[randwrite] ioengine=libaio rw=randwrite blocksize=4k iodepth=4 direct=1 size=${SIZE} runtime=60 numjobs=16 group_reporting directory=/ai400x/out create_serialize=0 filename_format=f.$jobnum.$filenum
The test case is that 2 clients have each 16 fio processes and each fio process does 4k random write to different files.
However, if file size is large (128GB in this case), it causes the huge performance impacts. Here is two test results.
1GB file
# SIZE=1g /work/ihara/fio.git/fio --client=hostfile randomwrite.fio write: IOPS=16.8k, BW=65.5MiB/s (68.7MB/s)(3930MiB/60004msec); 0 zone resets
128GB file
# SIZE=128g /work/ihara/fio.git/fio --client=hostfile randomwrite.fio write: IOPS=2894, BW=11.3MiB/s (11.9MB/s)(679MiB/60039msec)
As far as I observed those two cases and collected cpu profiles on OSS, in 128GB file case, there were big spinlocks in ldiskfs_mb_new_block() and ldiskfs_mb_normalized_request() and it spent 89% time (14085/15823 samples) of total ost_io_xx() against 20% (1895/9296 samples) in 1GB file case. Please see attached framegraph.
Attachments
Issue Links
- is related to
-
LU-13765 ldiskfs_mb_mark_diskspace_used:3472: aborting transaction: error 28 in __ldiskfs_handle_dirty_metadata
- Resolved