Details
Description
When PFL is used to create a file with a DoM component, the DoM component has lmm_stripe_count == 0,
# lfs getstripe RVDC_03_03022_10_2023.h5 | head -n 15 RVDC_03_03022_10_2023.h5 lcm_layout_gen: 12 lcm_mirror_count: 1 lcm_entry_count: 7 lcme_id: 1 lcme_mirror_id: 0 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: 65536 lmm_stripe_count: 0 lmm_stripe_size: 65536 lmm_pattern: mdt lmm_layout_gen: 0 lmm_stripe_offset: 0
This triggers a divide-by-zero in MPIIO in in ADIOI_LUSTRE_Get_striping_info(), here:
63 avail_cb_nodes = 64 stripe_count * ADIOI_MIN(nprocs_for_coll/stripe_count, CO);
Where the value in stripe_count originally comes from ADIOI_LUSTRE_Open() via:
115 err = ioctl(fd->fd_sys, LL_IOC_LOV_SETSTRIPE, lum);
and
137 fd->hints->striping_factor = lum->lmm_stripe_count;
Is lmm_stripe_count 0 for DoM components because it has to be, or does Lustre ignore it internally, and so the value hasn't mattered?
I'm wondering if Lustre should change, or MPIIO, or both.