[LU-9657] Make Lustre ADIO driver work with PFL correctly Created: 13/Jun/17 Updated: 18/Jul/22 Resolved: 18/Jul/22 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.11.0 |
| Type: | Improvement | Priority: | Major |
| Reporter: | Emoly Liu | Assignee: | Emoly Liu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | pfl | ||
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
The work includes:
The patch will be submitted to MPICH finally. |
| Comments |
| Comment by Andreas Dilger [ 15/Jun/17 ] |
|
It should be noted that in cases where MPICH knows the total file size or the number of parallel writers in advance (I don't know how often that is true or not), then it is likely more efficient to just have it specify a single N-stripe file rather than using a PFL file. PFL files should mostly be used when the application doesn't know in advance how large the file is going to be, or the number of concurrent readers/writers. |
| Comment by Emoly Liu [ 20/Jun/17 ] |
|
Now the ADIO driver replaced with llapi_layout_* interfaces can work correctly on a non-PFL file and next I will test it on a PFL file. BTW, later I will post a lustre patch to add a new llapi_layout_* interface, which was introduced during my test. |
| Comment by Gerrit Updater [ 21/Jun/17 ] |
|
Emoly Liu (emoly.liu@intel.com) uploaded a new patch: https://review.whamcloud.com/27752 |
| Comment by Emoly Liu [ 22/Jun/17 ] |
|
adilger, I have a question: do we need to let the user know whether the file layout is composite or not?
Do you think which is better? BTW, I am testing the ADIO patch on trevis nodes. And since I'm using IOR, the file is non-PFL. Do I need to add some hints to specify PFL striping information? Thanks for any advice! |
| Comment by Andreas Dilger [ 22/Jun/17 ] |
|
In general, I think composite and non-composite files should be treated similarly where possible. I wouldn't object to returning the one component in the FIRST/LAST case if that simplifies using these APIs. I'm not against aching a helper function to return whether the layout is composite or not, but I suspect there are already several ways to check this - component count, magic, etc. |
| Comment by Gerrit Updater [ 28/Jun/17 ] |
|
Emoly Liu (emoly.liu@intel.com) uploaded a new patch: https://review.whamcloud.com/27865 |
| Comment by Gerrit Updater [ 28/Jun/17 ] |
|
Emoly Liu (emoly.liu@intel.com) uploaded a new patch: https://review.whamcloud.com/27869 |
| Comment by Emoly Liu [ 29/Jun/17 ] |
|
adilger, could you please review this ADIO patch at https://review.whamcloud.com/27869 ? I made the following changes:
The patch can work correctly on my local two vm machines by IOR + non-PFL file, but failed sometimes on trevis multiple nodes, IOR+POSIX+ADIO also failed on trevis either. The following is the output of a simple collective write test: [root@centos7-2 C]# rm /mnt/lustre/iorfile
rm: remove regular file ‘/mnt/lustre/iorfile’? y
[root@centos7-2 C]# lfs osts
OBDS:
0: lustre-OST0000_UUID ACTIVE
1: lustre-OST0001_UUID ACTIVE
2: lustre-OST0002_UUID ACTIVE
3: lustre-OST0003_UUID ACTIVE
[root@centos7-2 C]# cat hostfile
centos7-2
centos7-3
[root@centos7-2 C]# mpirun -np 2 -machinefile ./hostfile /root/ior/src/C/IOR -a MPIIO -b 6M -o /mnt/lustre/iorfile -t 1M -v -c -w -r -W -i 1 -T 30 -k -U /root/ior/src/C/hint -H
IOR-2.10.3: MPI Coordinated Test of Parallel I/O
Run began: Thu Jun 29 11:05:32 2017
Command line used: /root/ior/src/C/IOR -a MPIIO -b 6M -o /mnt/lustre/iorfile -t 1M -v -c -w -r -W -i 1 -T 30 -k -U /root/ior/src/C/hint -H
Machine: Linux centos7-2
Start time skew across all tasks: 0.57 sec
Path: /mnt/lustre
FS: 1.2 GiB Used FS: 5.1% Inodes: 0.1 Mi Used Inodes: 0.3%
Participating tasks: 2
Summary:
api = MPIIO (version=3, subversion=1)
test filename = /mnt/lustre/iorfile
access = single-shared-file, collective
pattern = segmented (1 segment)
ordering in a file = sequential offsets
ordering inter file= no tasks offsets
clients = 2 (1 per node)
repetitions = 1
xfersize = 1 MiB
blocksize = 6 MiB
aggregate filesize = 12 MiB
hints passed to MPI_File_open() {
striping_factor = 2
striping_unit = 524288
directIO = disable
romio_lustre_co_ratio = 2
same_io_size = no
contiguous_data = yes
ds_in_coll = enable
big_req_size = 40960
}
hints returned from opened file {
direct_read = false
direct_write = false
romio_lustre_co_ratio = 2
romio_lustre_coll_threshold = 0
romio_lustre_ds_in_coll = enable
striping_unit = 524288
striping_factor = 2
cb_buffer_size = 16777216
romio_cb_read = automatic
romio_cb_write = automatic
cb_nodes = 2
romio_no_indep_rw = false
romio_cb_pfr = disable
romio_cb_fr_types = aar
romio_cb_fr_alignment = 1
romio_cb_ds_threshold = 0
romio_cb_alltoall = automatic
ind_rd_buffer_size = 4194304
ind_wr_buffer_size = 524288
romio_ds_read = automatic
romio_ds_write = automatic
cb_config_list = *:1
romio_filesystem_type = LUSTRE:
romio_aggregator_list = 0 1
romio_lustre_start_iodevice = 3
}
Commencing write performance test.
Thu Jun 29 11:05:32 2017
Verifying contents of the file(s) just written.
Thu Jun 29 11:05:32 2017
hints passed to MPI_File_open() {
striping_factor = 2
striping_unit = 524288
directIO = disable
romio_lustre_co_ratio = 2
same_io_size = no
contiguous_data = yes
ds_in_coll = enable
big_req_size = 40960
}
hints returned from opened file {
direct_read = false
direct_write = false
romio_lustre_co_ratio = 2
romio_lustre_coll_threshold = 0
romio_lustre_ds_in_coll = enable
striping_unit = 524288
striping_factor = 2
cb_buffer_size = 16777216
romio_cb_read = automatic
romio_cb_write = automatic
cb_nodes = 2
romio_no_indep_rw = false
romio_cb_pfr = disable
romio_cb_fr_types = aar
romio_cb_fr_alignment = 1
romio_cb_ds_threshold = 0
romio_cb_alltoall = automatic
ind_rd_buffer_size = 4194304
ind_wr_buffer_size = 524288
romio_ds_read = automatic
romio_ds_write = automatic
cb_config_list = *:1
romio_filesystem_type = LUSTRE:
romio_aggregator_list = 0 1
romio_lustre_start_iodevice = 3
}
hints passed to MPI_File_open() {
striping_factor = 2
striping_unit = 524288
directIO = disable
romio_lustre_co_ratio = 2
same_io_size = no
contiguous_data = yes
ds_in_coll = enable
big_req_size = 40960
}
hints returned from opened file {
direct_read = false
direct_write = false
romio_lustre_co_ratio = 2
romio_lustre_coll_threshold = 0
romio_lustre_ds_in_coll = enable
striping_unit = 524288
striping_factor = 2
cb_buffer_size = 16777216
romio_cb_read = automatic
romio_cb_write = automatic
cb_nodes = 2
romio_no_indep_rw = false
romio_cb_pfr = disable
romio_cb_fr_types = aar
romio_cb_fr_alignment = 1
romio_cb_ds_threshold = 0
romio_cb_alltoall = automatic
ind_rd_buffer_size = 4194304
ind_wr_buffer_size = 524288
romio_ds_read = automatic
romio_ds_write = automatic
cb_config_list = *:1
romio_filesystem_type = LUSTRE:
romio_aggregator_list = 0 1
romio_lustre_start_iodevice = 3
}
Commencing read performance test.
Thu Jun 29 11:05:33 2017
Operation Max (MiB) Min (MiB) Mean (MiB) Std Dev Max (OPs) Min (OPs) Mean (OPs) Std Dev Mean (s) Op grep #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize
--------- --------- --------- ---------- ------- --------- --------- ---------- ------- --------
write 62.17 62.17 62.17 0.00 62.17 62.17 62.17 0.00 0.19302 2 1 1 0 0 1 0 0 1 6291456 1048576 12582912 -1 MPIIO EXCEL
read 138.39 138.39 138.39 0.00 138.39 138.39 138.39 0.00 0.08671 2 1 1 0 0 1 0 0 1 6291456 1048576 12582912 -1 MPIIO EXCEL
Max Write: 62.17 MiB/sec (65.19 MB/sec)
Max Read: 138.39 MiB/sec (145.11 MB/sec)
Run finished: Thu Jun 29 11:05:33 2017
[root@centos7-2 C]# lfs getstripe /mnt/lustre/iorfile
/mnt/lustre/iorfile
lmm_stripe_count: 2
lmm_stripe_size: 524288
lmm_pattern: 1
lmm_layout_gen: 0
lmm_stripe_offset: 0
obdidx objid objid group
0 10 0xa 0
1 10 0xa 0
[root@centos7-2 C]# ls -al /mnt/lustre/iorfile
-rw-r--r--. 1 root root 12582912 Jun 29 11:05 /mnt/lustre/iorfile
If this change for non-PFL is OK, I will move to add PFL hints and do some tests. |
| Comment by Andreas Dilger [ 04/Jul/17 ] |
|
It isn't totally clear why you are using the LCM of the stripe count, instead of using LCM(stripe_count * stripe_size) of each component? Also, if the first component is very small (1 stripe, small stripe size <= 1MB) then it should probably be skipped in this calculation, as it will not contribute significantly to the overall performance of the file. |
| Comment by Emoly Liu [ 05/Jul/17 ] |
|
The LCM of the stripe count is used to calculate avail_cb_nodes, the number of the MPI processes who will join in this one read/write. |
| Comment by Gerrit Updater [ 19/Jul/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27752/ |
| Comment by James A Simmons [ 08/Aug/17 ] |
|
Has anyone push the PFL patch to MPICH git repo yet? |
| Comment by Emoly Liu [ 09/Aug/17 ] |
|
The ADIO+PFL patch has not been pushed to MPICH git repo. Now it can set/analyze the PFL layout parameters correctly by two new hints "romio_lustre_pfl" and "romio_lustre_pfl_layout", but it always fails if the first component stripe size < 1MB. I'm investigating the issue, then will update the patch at https://review.whamcloud.com/27869 . |
| Comment by Emoly Liu [ 09/Aug/17 ] |
|
Here is my simple test to run ADIO+PFL on 4 OSTs: [root@centos7-2 C]# cat hostfile centos7-2 centos7-3 [root@centos7-2 C]# cat hint IOR_HINT__MPI__romio_lustre_pfl=enable IOR_HINT__MPI__romio_lustre_pfl_layout=-E 4M -c 2 -S 1M -E -1 -c 4 -S 512K IOR_HINT__MPI__striping_factor=2 IOR_HINT__MPI__striping_unit=2097152 IOR_HINT__MPI__directIO=disable IOR_HINT__MPI__romio_lustre_co_ratio=2 IOR_HINT__MPI__same_io_size=no IOR_HINT__MPI__contiguous_data=yes IOR_HINT__MPI__ds_in_coll=enable IOR_HINT__MPI__big_req_size=40960 Here is the output: [root@centos7-2 C]# mpirun -np 2 -machinefile ./hostfile /root/ior/src/C/IOR -a MPIIO -b 6M -o /mnt/lustre/iorfile -t 1M -v -c -w -W -i 1 -T 30 -k -U /root/ior/src/C/hint -H
IOR-2.10.3: MPI Coordinated Test of Parallel I/O
Run began: Wed Aug 9 11:49:09 2017
Command line used: /root/ior/src/C/IOR -a MPIIO -b 6M -o /mnt/lustre/iorfile -t 1M -v -c -w -W -i 1 -T 30 -k -U /root/ior/src/C/hint -H
Machine: Linux centos7-2
Start time skew across all tasks: 0.37 sec
Path: /mnt/lustre
FS: 1.2 GiB Used FS: 5.1% Inodes: 0.1 Mi Used Inodes: 0.3%
Participating tasks: 2
Summary:
api = MPIIO (version=3, subversion=1)
test filename = /mnt/lustre/iorfile
access = single-shared-file, collective
pattern = segmented (1 segment)
ordering in a file = sequential offsets
ordering inter file= no tasks offsets
clients = 2 (1 per node)
repetitions = 1
xfersize = 1 MiB
blocksize = 6 MiB
aggregate filesize = 12 MiB
hints passed to MPI_File_open() {
romio_lustre_pfl = enable
romio_lustre_pfl_layout = -E 4M -c 2 -S 1M -E -1 -c 4 -S 512K
striping_factor = 2
striping_unit = 2097152
directIO = disable
romio_lustre_co_ratio = 2
same_io_size = no
contiguous_data = yes
ds_in_coll = enable
big_req_size = 40960
}
hints returned from opened file {
direct_read = false
direct_write = false
romio_lustre_co_ratio = 2
romio_lustre_coll_threshold = 0
romio_lustre_ds_in_coll = enable
striping_unit = 2097152
striping_factor = 2
romio_lustre_pfl = enable
cb_config_list = *:1
cb_buffer_size = 16777216
romio_cb_read = automatic
romio_cb_write = automatic
cb_nodes = 2
romio_no_indep_rw = false
romio_cb_pfr = disable
romio_cb_fr_types = aar
romio_cb_fr_alignment = 1
romio_cb_ds_threshold = 0
romio_cb_alltoall = automatic
ind_rd_buffer_size = 4194304
ind_wr_buffer_size = 524288
romio_ds_read = automatic
romio_ds_write = automatic
romio_filesystem_type = LUSTRE:
romio_aggregator_list = 0 1
}
Commencing write performance test.
Wed Aug 9 11:49:09 2017
ADIOI_LUSTRE_Calc_my_req(371): rank(0) data needed from 0 (count = 1):
ADIOI_LUSTRE_Calc_my_req(374): off[0] = 0, len[0] = 1048576
ADIOI_LUSTRE_Calc_my_req(371): rank(1) data needed from 0 (count = 1):
ADIOI_LUSTRE_Calc_my_req(374): off[0] = 6291456, len[0] = 524288
ADIOI_LUSTRE_Calc_my_req(371): rank(1) data needed from 1 (count = 1):
ADIOI_LUSTRE_Calc_my_req(374): off[0] = 6815744, len[0] = 524288
ADIOI_LUSTRE_Calc_my_req(371): rank(0) data needed from 1 (count = 1):
ADIOI_LUSTRE_Calc_my_req(374): off[0] = 1048576, len[0] = 1048576
ADIOI_LUSTRE_Calc_my_req(371): rank(1) data needed from 0 (count = 1):
ADIOI_LUSTRE_Calc_my_req(374): off[0] = 7340032, len[0] = 524288
ADIOI_LUSTRE_Calc_my_req(371): rank(1) data needed from 1 (count = 1):
ADIOI_LUSTRE_Calc_my_req(374): off[0] = 7864320, len[0] = 524288
ADIOI_LUSTRE_Calc_my_req(371): rank(1) data needed from 0 (count = 1):
ADIOI_LUSTRE_Calc_my_req(374): off[0] = 8388608, len[0] = 524288
ADIOI_LUSTRE_Calc_my_req(371): rank(1) data needed from 1 (count = 1):
ADIOI_LUSTRE_Calc_my_req(374): off[0] = 8912896, len[0] = 524288
ADIOI_LUSTRE_Calc_my_req(371): rank(0) data needed from 0 (count = 1):
ADIOI_LUSTRE_Calc_my_req(374): off[0] = 2097152, len[0] = 1048576
ADIOI_LUSTRE_Calc_my_req(371): rank(0) data needed from 1 (count = 1):
ADIOI_LUSTRE_Calc_my_req(374): off[0] = 3145728, len[0] = 1048576
ADIOI_LUSTRE_Calc_my_req(371): rank(1) data needed from 0 (count = 1):
ADIOI_LUSTRE_Calc_my_req(374): off[0] = 9437184, len[0] = 524288
ADIOI_LUSTRE_Calc_my_req(371): rank(1) data needed from 1 (count = 1):
ADIOI_LUSTRE_Calc_my_req(374): off[0] = 9961472, len[0] = 524288
ADIOI_LUSTRE_Calc_my_req(371): rank(0) data needed from 0 (count = 1):
ADIOI_LUSTRE_Calc_my_req(374): off[0] = 4194304, len[0] = 524288
ADIOI_LUSTRE_Calc_my_req(371): rank(0) data needed from 1 (count = 1):
ADIOI_LUSTRE_Calc_my_req(374): off[0] = 4718592, len[0] = 524288
ADIOI_LUSTRE_Calc_my_req(371): rank(1) data needed from 0 (count = 1):
ADIOI_LUSTRE_Calc_my_req(374): off[0] = 10485760, len[0] = 524288
ADIOI_LUSTRE_Calc_my_req(371): rank(1) data needed from 1 (count = 1):
ADIOI_LUSTRE_Calc_my_req(374): off[0] = 11010048, len[0] = 524288
ADIOI_LUSTRE_Calc_my_req(371): rank(0) data needed from 0 (count = 1):
ADIOI_LUSTRE_Calc_my_req(374): off[0] = 5242880, len[0] = 524288
ADIOI_LUSTRE_Calc_my_req(371): rank(0) data needed from 1 (count = 1):
ADIOI_LUSTRE_Calc_my_req(374): off[0] = 5767168, len[0] = 524288
ADIOI_LUSTRE_Calc_my_req(371): rank(1) data needed from 0 (count = 1):
ADIOI_LUSTRE_Calc_my_req(374): off[0] = 11534336, len[0] = 524288
ADIOI_LUSTRE_Calc_my_req(371): rank(1) data needed from 1 (count = 1):
ADIOI_LUSTRE_Calc_my_req(374): off[0] = 12058624, len[0] = 524288
Verifying contents of the file(s) just written.
Wed Aug 9 11:49:09 2017
hints passed to MPI_File_open() {
romio_lustre_pfl = enable
romio_lustre_pfl_layout = -E 4M -c 2 -S 1M -E -1 -c 4 -S 512K
striping_factor = 2
striping_unit = 2097152
directIO = disable
romio_lustre_co_ratio = 2
same_io_size = no
contiguous_data = yes
ds_in_coll = enable
big_req_size = 40960
}
hints returned from opened file {
direct_read = false
direct_write = false
romio_lustre_co_ratio = 2
romio_lustre_coll_threshold = 0
romio_lustre_ds_in_coll = enable
striping_unit = 2097152
striping_factor = 2
romio_lustre_pfl = enable
cb_config_list = *:1
cb_buffer_size = 16777216
romio_cb_read = automatic
romio_cb_write = automatic
cb_nodes = 2
romio_no_indep_rw = false
romio_cb_pfr = disable
romio_cb_fr_types = aar
romio_cb_fr_alignment = 1
romio_cb_ds_threshold = 0
romio_cb_alltoall = automatic
ind_rd_buffer_size = 4194304
ind_wr_buffer_size = 524288
romio_ds_read = automatic
romio_ds_write = automatic
romio_filesystem_type = LUSTRE:
romio_aggregator_list = 0 1
}
Operation Max (MiB) Min (MiB) Mean (MiB) Std Dev Max (OPs) Min (OPs) Mean (OPs) Std Dev Mean (s) Op grep #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize
--------- --------- --------- ---------- ------- --------- --------- ---------- ------- --------
write 56.14 56.14 56.14 0.00 56.14 56.14 56.14 0.00 0.21374 2 1 1 0 0 1 0 0 1 6291456 1048576 12582912 -1 MPIIO EXCEL
Max Write: 56.14 MiB/sec (58.87 MB/sec)
Run finished: Wed Aug 9 11:49:09 2017
Here is the layout of file iorfile: [root@centos7-2 C]# ls -al /mnt/lustre/iorfile
-rw-r--r--. 1 root root 12582912 Aug 9 11:49 /mnt/lustre/iorfile
[root@centos7-2 C]# lfs getstripe /mnt/lustre/iorfile
/mnt/lustre/iorfile
lcm_layout_gen: 3
lcm_entry_count: 2
lcme_id: 1
lcme_flags: init
lcme_extent.e_start: 0
lcme_extent.e_end: 4194304
lmm_stripe_count: 2
lmm_stripe_size: 1048576
lmm_pattern: 1
lmm_layout_gen: 0
lmm_stripe_offset: 0
lmm_objects:
- 0: { l_ost_idx: 0, l_fid: [0x100000000:0x1e:0x0] }
- 1: { l_ost_idx: 1, l_fid: [0x100010000:0x1e:0x0] }
lcme_id: 2
lcme_flags: init
lcme_extent.e_start: 4194304
lcme_extent.e_end: EOF
lmm_stripe_count: 4
lmm_stripe_size: 524288
lmm_pattern: 1
lmm_layout_gen: 0
lmm_stripe_offset: 3
lmm_objects:
- 0: { l_ost_idx: 3, l_fid: [0x100030000:0x10:0x0] }
- 1: { l_ost_idx: 2, l_fid: [0x100020000:0x10:0x0] }
- 2: { l_ost_idx: 0, l_fid: [0x100000000:0x1f:0x0] }
- 3: { l_ost_idx: 1, l_fid: [0x100010000:0x1f:0x0] }
|
| Comment by Andreas Dilger [ 09/Aug/17 ] |
|
Just to confirm, the Lustre ADIO driver should still work properly without the PFL hints - those are only hints to create a PFL file? What has been proposed for specifying a PFL template layout for nodemap is to give the FID (or in this case the pathname) of a directory with the desired layout template. That allows the user to create an arbitrarily complex layout for the output files, without having to specify a complex syntax to create the composite file. The other option is to allow a string of YAML to specify the layout, like what Bobijam has done for saving and restoring the layout on other files. That is especially true because after PFL files there will be FLR files, so the hint name should not be "*_pfl". |
| Comment by Cong Xu (Inactive) [ 09/Aug/17 ] |
|
One issue of pushing this patch to MPICH git repo is that this code cannot be compiled over a regular Lustre file system, because on regular Lustre the PFL header file is missing, and the PFL APIs of setting/getting striping configuration are not supported. We need to figure out how to make this code work over both regular Lustre and Lustre with PFL feature. For the hints file question. Yes, if the hints file is not provided, the Lustre ADIO driver still work. The file inherits the striping configuration from the directory of the file automatically, and Lustre ADIO driver uses system call to obtain striping information of the file and calculates aggregators. Contrarily, when the hints file is provided, the Lustre ADIO driver creates a file on Lustre, and the striping configuration of the file follows the hints file. |
| Comment by Emoly Liu [ 10/Aug/17 ] |
|
Thanks for Cong's reply. |
| Comment by Emoly Liu [ 21/Aug/17 ] |
|
Now the ADIO driver can work with PFL feature correctly by specifying an YAML template file. I still have some questions about this work:
I will post the current ADIO+PFL example later. |
| Comment by Emoly Liu [ 21/Aug/17 ] |
|
Here is my simple test to run ADIO+PFL on 4 OSTs: [root@centos7-2 C]# cat hostfile
centos7-2
centos7-3
[root@centos7-2 C]# cat hint
IOR_HINT__MPI__romio_lustre_layout_yaml_temp=/root/ior/src/C/yaml_temp
IOR_HINT__MPI__striping_factor=2
IOR_HINT__MPI__striping_unit=1048576
IOR_HINT__MPI__directIO=disable
IOR_HINT__MPI__romio_lustre_co_ratio=2
IOR_HINT__MPI__same_io_size=no
IOR_HINT__MPI__contiguous_data=yes
IOR_HINT__MPI__ds_in_coll=enable
IOR_HINT__MPI__big_req_size=40960
[root@centos7-2 C]# cat yaml_temp
lcm_layout_gen:
lcm_entry_count: 4
component0:
lcme_id: 1
lcme_flags: init
lcme_extent.e_start: 0
lcme_extent.e_end: 4194304
sub_layout:
lmm_stripe_count: 2
lmm_stripe_size: 524288
lmm_pattern: 1
lmm_layout_gen: 0
lmm_stripe_offset: 0
component1:
lcme_id: 2
lcme_flags: 0
lcme_extent.e_start: 4194304
lcme_extent.e_end: 8388608
sub_layout:
lmm_stripe_count: 4
lmm_stripe_size: 1048576
lmm_pattern: 1
lmm_layout_gen: 0
lmm_stripe_offset: -1
component2:
lcme_id: 3
lcme_flags: 0
lcme_extent.e_start: 8388608
lcme_extent.e_end: 12582912
sub_layout:
lmm_stripe_count: 4
lmm_stripe_size: 262144
lmm_pattern: 1
lmm_layout_gen: 0
lmm_stripe_offset: 0
component3:
lcme_id: 4
lcme_flags: 0
lcme_extent.e_start: 12582912
lcme_extent.e_end: EOF
sub_layout:
lmm_stripe_count: 2
lmm_stripe_size: 2097152
lmm_pattern: 1
lmm_layout_gen: 0
lmm_stripe_offset: -1
Here is the output: [root@centos7-2 C]# mpirun -np 2 -machinefile ./hostfile /root/ior/src/C/IOR -a MPIIO -b 6M -o /mnt/lustre/iorfile -t 1M -v -c -w -r -W -i 1 -T 30 -k -U /root/ior/src/C/hint -H
IOR-2.10.3: MPI Coordinated Test of Parallel I/O
Run began: Mon Aug 21 18:23:24 2017
Command line used: /root/ior/src/C/IOR -a MPIIO -b 6M -o /mnt/lustre/iorfile -t 1M -v -c -w -r -W -i 1 -T 30 -k -U /root/ior/src/C/hint -H
Machine: Linux centos7-2
Start time skew across all tasks: 0.42 sec
Path: /mnt/lustre
FS: 1.2 GiB Used FS: 5.1% Inodes: 0.1 Mi Used Inodes: 0.3%
Participating tasks: 2
Summary:
api = MPIIO (version=3, subversion=1)
test filename = /mnt/lustre/iorfile
access = single-shared-file, collective
pattern = segmented (1 segment)
ordering in a file = sequential offsets
ordering inter file= no tasks offsets
clients = 2 (1 per node)
repetitions = 1
xfersize = 1 MiB
blocksize = 6 MiB
aggregate filesize = 12 MiB
hints passed to MPI_File_open() {
romio_lustre_layout_yaml_temp = /root/ior/src/C/yaml_temp
striping_factor = 2
striping_unit = 1048576
directIO = disable
romio_lustre_co_ratio = 2
same_io_size = no
contiguous_data = yes
ds_in_coll = enable
big_req_size = 40960
}
hints returned from opened file {
direct_read = false
direct_write = false
romio_lustre_co_ratio = 2
romio_lustre_coll_threshold = 0
romio_lustre_ds_in_coll = enable
striping_unit = 1048576
striping_factor = 2
cb_config_list = *:1
cb_buffer_size = 16777216
romio_cb_read = automatic
romio_cb_write = automatic
cb_nodes = 2
romio_no_indep_rw = false
romio_cb_pfr = disable
romio_cb_fr_types = aar
romio_cb_fr_alignment = 1
romio_cb_ds_threshold = 0
romio_cb_alltoall = automatic
ind_rd_buffer_size = 4194304
ind_wr_buffer_size = 524288
romio_ds_read = automatic
romio_ds_write = automatic
romio_filesystem_type = LUSTRE:
romio_aggregator_list = 0 1
}
Commencing write performance test.
Mon Aug 21 18:23:24 2017
Verifying contents of the file(s) just written.
Mon Aug 21 18:23:25 2017
hints passed to MPI_File_open() {
romio_lustre_layout_yaml_temp = /root/ior/src/C/yaml_temp
striping_factor = 2
striping_unit = 1048576
directIO = disable
romio_lustre_co_ratio = 2
same_io_size = no
contiguous_data = yes
ds_in_coll = enable
big_req_size = 40960
}
hints returned from opened file {
direct_read = false
direct_write = false
romio_lustre_co_ratio = 2
romio_lustre_coll_threshold = 0
romio_lustre_ds_in_coll = enable
striping_unit = 1048576
striping_factor = 2
cb_config_list = *:1
cb_buffer_size = 16777216
romio_cb_read = automatic
romio_cb_write = automatic
cb_nodes = 2
romio_no_indep_rw = false
romio_cb_pfr = disable
romio_cb_fr_types = aar
romio_cb_fr_alignment = 1
romio_cb_ds_threshold = 0
romio_cb_alltoall = automatic
ind_rd_buffer_size = 4194304
ind_wr_buffer_size = 524288
romio_ds_read = automatic
romio_ds_write = automatic
romio_filesystem_type = LUSTRE:
romio_aggregator_list = 0 1
}
hints passed to MPI_File_open() {
romio_lustre_layout_yaml_temp = /root/ior/src/C/yaml_temp
striping_factor = 2
striping_unit = 1048576
directIO = disable
romio_lustre_co_ratio = 2
same_io_size = no
contiguous_data = yes
ds_in_coll = enable
big_req_size = 40960
}
hints returned from opened file {
direct_read = false
direct_write = false
romio_lustre_co_ratio = 2
romio_lustre_coll_threshold = 0
romio_lustre_ds_in_coll = enable
striping_unit = 1048576
striping_factor = 2
cb_config_list = *:1
cb_buffer_size = 16777216
romio_cb_read = automatic
romio_cb_write = automatic
cb_nodes = 2
romio_no_indep_rw = false
romio_cb_pfr = disable
romio_cb_fr_types = aar
romio_cb_fr_alignment = 1
romio_cb_ds_threshold = 0
romio_cb_alltoall = automatic
ind_rd_buffer_size = 4194304
ind_wr_buffer_size = 524288
romio_ds_read = automatic
romio_ds_write = automatic
romio_filesystem_type = LUSTRE:
romio_aggregator_list = 0 1
}
Commencing read performance test.
Mon Aug 21 18:23:25 2017
Operation Max (MiB) Min (MiB) Mean (MiB) Std Dev Max (OPs) Min (OPs) Mean (OPs) Std Dev Mean (s) Op grep #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize
--------- --------- --------- ---------- ------- --------- --------- ---------- ------- --------
write 15.84 15.84 15.84 0.00 15.84 15.84 15.84 0.00 0.75749 2 1 1 0 0 1 0 0 1 6291456 1048576 12582912 -1 MPIIO EXCEL
read 129.90 129.90 129.90 0.00 129.90 129.90 129.90 0.00 0.09238 2 1 1 0 0 1 0 0 1 6291456 1048576 12582912 -1 MPIIO EXCEL
Max Write: 15.84 MiB/sec (16.61 MB/sec)
Max Read: 129.90 MiB/sec (136.21 MB/sec)
Run finished: Mon Aug 21 18:23:25 2017
Here is the layout of file iorfile: [root@centos7-2 C]# lfs getstripe /mnt/lustre/iorfile
/mnt/lustre/iorfile
lcm_layout_gen: 6
lcm_entry_count: 4
lcme_id: 1
lcme_flags: init
lcme_extent.e_start: 0
lcme_extent.e_end: 4194304
lmm_stripe_count: 2
lmm_stripe_size: 524288
lmm_pattern: 1
lmm_layout_gen: 0
lmm_stripe_offset: 0
lmm_objects:
- 0: { l_ost_idx: 0, l_fid: [0x100000000:0x18:0x0] }
- 1: { l_ost_idx: 1, l_fid: [0x100010000:0x18:0x0] }
lcme_id: 2
lcme_flags: init
lcme_extent.e_start: 4194304
lcme_extent.e_end: 8388608
lmm_stripe_count: 4
lmm_stripe_size: 1048576
lmm_pattern: 1
lmm_layout_gen: 0
lmm_stripe_offset: 3
lmm_objects:
- 0: { l_ost_idx: 3, l_fid: [0x100030000:0xe:0x0] }
- 1: { l_ost_idx: 2, l_fid: [0x100020000:0xe:0x0] }
- 2: { l_ost_idx: 0, l_fid: [0x100000000:0x19:0x0] }
- 3: { l_ost_idx: 1, l_fid: [0x100010000:0x19:0x0] }
lcme_id: 3
lcme_flags: init
lcme_extent.e_start: 8388608
lcme_extent.e_end: 12582912
lmm_stripe_count: 4
lmm_stripe_size: 262144
lmm_pattern: 1
lmm_layout_gen: 0
lmm_stripe_offset: 0
lmm_objects:
- 0: { l_ost_idx: 0, l_fid: [0x100000000:0x1a:0x0] }
- 1: { l_ost_idx: 1, l_fid: [0x100010000:0x1a:0x0] }
- 2: { l_ost_idx: 2, l_fid: [0x100020000:0xf:0x0] }
- 3: { l_ost_idx: 3, l_fid: [0x100030000:0xf:0x0] }
lcme_id: 4
lcme_flags: 0
lcme_extent.e_start: 12582912
lcme_extent.e_end: EOF
lmm_stripe_count: 2
lmm_stripe_size: 2097152
lmm_pattern: 1
lmm_layout_gen: 0
lmm_stripe_offset: -1
|
| Comment by Emoly Liu [ 24/Aug/17 ] |
|
I improved the code to use only one hint "romio_lustre_comp_layout" to specify the composite layout in 3 formats:
Here is an example of ior hint file: IOR_HINT__MPI__romio_lustre_comp_layout=/root/ior/src/C/yaml_temp #IOR_HINT__MPI__romio_lustre_comp_layout=/mnt/lustre/testfile #IOR_HINT__MPI__romio_lustre_comp_layout=-E 4M -c 2 -S 512K -E 8M -c 4 -S 1M -E -1 -S 256K The latter two formats are used in case that YAML is not present. The patch has been updated at https://review.whamcloud.com/#/c/27869/9/ |
| Comment by Gerrit Updater [ 28/Aug/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27865/ |
| Comment by Robert Latham [ 18/Jul/22 ] |
|
I think it's correct to close this as fixed/resolved on your end, and the ball is in MPICH's court. I did not merge work into MPICH 4 years ago because we didn't have any PFL lustre to test on. I am sure I can find some PFL lustre nowadays and will revisit https://github.com/pmodels/mpich/pull/3290 |