Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15834

FLR: "lfs mirror extend" should take current OSTs into account

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.14.0
    • None
    • 3
    • 9223372036854775807

    Description

      When running "lfs mirror extend" to add a mirror to an existing file, it appears that the OST(s) used for the original layout are not taken into account, if OST pools or indices are not explicitly specified:

      $ lfs getstripe /myth/tmp/flr
      /myth/tmp/flr
      lmm_stripe_count:  1
      lmm_stripe_size:   1048576
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: 1
      	obdidx		 objid		 objid		 group
      	     1	       2352362	     0x23e4ea	             0
      $ lfs mirror extend -N -c 1 /myth/tmp/flr
      $ lfs getstripe /myth/tmp/flr
      /myth/tmp/flr
        lcm_layout_gen:    1
        lcm_mirror_count:  2
        lcm_entry_count:   2
          lcme_id:             65537
          lcme_mirror_id:      1
          lcme_flags:          init
          lcme_extent.e_start: 0
          lcme_extent.e_end:   EOF
            lmm_stripe_count:  1
            lmm_stripe_size:   1048576
            lmm_pattern:       raid0
            lmm_layout_gen:    0
            lmm_stripe_offset: 1
            lmm_objects:
            - 0: { l_ost_idx: 1, l_fid: [0x100010000:0x23e4ea:0x0] }
      
          lcme_id:             131073
          lcme_mirror_id:      2
          lcme_flags:          init
          lcme_extent.e_start: 0
          lcme_extent.e_end:   EOF
            lmm_stripe_count:  1
            lmm_stripe_size:   1048576
            lmm_pattern:       raid0
            lmm_layout_gen:    0
            lmm_stripe_offset: 1
            lmm_objects:
            - 0: { l_ost_idx: 1, l_fid: [0x100010000:0x23e4eb:0x0] }
      

      in this case, both the original and mirror copy of the file are allocated on OST0001 because it has the most free space, which isn't helpful in terms of availability or performance for this file.

      A previous patch https://review.whamcloud.com/32404 "LU-9007 lod: improve obj alloc for FLR file" added lod_should_avoid_ost() to handle new OST object allocations for different components of a single mirror, but does not appear to handle allocations across FLR mirrors.

      Requirements for OST selection on components with overlapping extents should be, in order of decreasing priority:

      1. objects with overlapping components must not share the same OST. Implies mirror_count <= ost_count / component_stripe_count. In theory this could be relaxed if all replicas have the same stripe_count and stripe_size, then it would only require that the same OST cannot be at the same stripe_index of different components, in which case max replica count == OST count, but this is more difficult to control.
      2. objects with overlapping components should not share OSTs on the same OSS node (by NID from imp->imp_connection->c_peer.nid, as qos_add_tgt() does) to avoid the shared node failure domain.
      3. objects with overlapping components should not share OSTs on the same OSS failover pair (by failover NID from imp->imp_conn_list.oic_conn->c_peer.nid, as lprocfs_import_seq_show() does) to avoid the shared storage enclosure/controller failure domain. There may be other OSS nodes that share the same storage enclosure/controller, but there isn't any way for the client to determine this automatically.
      4. objects with overlapping components should not be on OSTs on the same network switch, power supply, rack, etc. but this depends on external information that is not currently available to Lustre. That could optionally be added via a separate configuration file/options, but the above cases will automatically cover the most important failure scenarios

      Attachments

        Issue Links

          Activity

            People

              bobijam Zhenyu Xu
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: