Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-19208

lmm_stripe_count: 0 in DoM component triggers divide-by-zero in MPIIO

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Medium
    • None
    • Lustre 2.15.7
    • lustre-2.15.7_1.llnl-2.t4.x86_64
    • 3
    • 9223372036854775807

    Description

      When PFL is used to create a file with a DoM component, the DoM component has lmm_stripe_count == 0,

      # lfs getstripe RVDC_03_03022_10_2023.h5 | head -n 15
      RVDC_03_03022_10_2023.h5
        lcm_layout_gen:    12
        lcm_mirror_count:  1
        lcm_entry_count:   7
          lcme_id:             1
          lcme_mirror_id:      0
          lcme_flags:          init
          lcme_extent.e_start: 0
          lcme_extent.e_end:   65536
            lmm_stripe_count:  0
            lmm_stripe_size:   65536
            lmm_pattern:       mdt
            lmm_layout_gen:    0
            lmm_stripe_offset: 0 

      This triggers a divide-by-zero in MPIIO in in ADIOI_LUSTRE_Get_striping_info(), here:

       63         avail_cb_nodes =
       64                 stripe_count * ADIOI_MIN(nprocs_for_coll/stripe_count, CO);
      

      Where the value in stripe_count originally comes from ADIOI_LUSTRE_Open() via:

      115             err = ioctl(fd->fd_sys, LL_IOC_LOV_SETSTRIPE, lum); 

      and

      137         fd->hints->striping_factor = lum->lmm_stripe_count;
      

       
      Is lmm_stripe_count 0 for DoM components because it has to be, or does Lustre ignore it internally, and so the value hasn't mattered?

      I'm wondering if Lustre should change, or MPIIO, or both.

      Attachments

        Issue Links

          Activity

            People

              pjones Peter Jones
              ofaaland Olaf Faaland
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: