Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15834

FLR: "lfs mirror extend" should take current OSTs into account

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.14.0
    • None
    • 3
    • 9223372036854775807

    Description

      When running "lfs mirror extend" to add a mirror to an existing file, it appears that the OST(s) used for the original layout are not taken into account, if OST pools or indices are not explicitly specified:

      $ lfs getstripe /myth/tmp/flr
      /myth/tmp/flr
      lmm_stripe_count:  1
      lmm_stripe_size:   1048576
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: 1
      	obdidx		 objid		 objid		 group
      	     1	       2352362	     0x23e4ea	             0
      $ lfs mirror extend -N -c 1 /myth/tmp/flr
      $ lfs getstripe /myth/tmp/flr
      /myth/tmp/flr
        lcm_layout_gen:    1
        lcm_mirror_count:  2
        lcm_entry_count:   2
          lcme_id:             65537
          lcme_mirror_id:      1
          lcme_flags:          init
          lcme_extent.e_start: 0
          lcme_extent.e_end:   EOF
            lmm_stripe_count:  1
            lmm_stripe_size:   1048576
            lmm_pattern:       raid0
            lmm_layout_gen:    0
            lmm_stripe_offset: 1
            lmm_objects:
            - 0: { l_ost_idx: 1, l_fid: [0x100010000:0x23e4ea:0x0] }
      
          lcme_id:             131073
          lcme_mirror_id:      2
          lcme_flags:          init
          lcme_extent.e_start: 0
          lcme_extent.e_end:   EOF
            lmm_stripe_count:  1
            lmm_stripe_size:   1048576
            lmm_pattern:       raid0
            lmm_layout_gen:    0
            lmm_stripe_offset: 1
            lmm_objects:
            - 0: { l_ost_idx: 1, l_fid: [0x100010000:0x23e4eb:0x0] }
      

      in this case, both the original and mirror copy of the file are allocated on OST0001 because it has the most free space, which isn't helpful in terms of availability or performance for this file.

      A previous patch https://review.whamcloud.com/32404 "LU-9007 lod: improve obj alloc for FLR file" added lod_should_avoid_ost() to handle new OST object allocations for different components of a single mirror, but does not appear to handle allocations across FLR mirrors.

      Requirements for OST selection on components with overlapping extents should be, in order of decreasing priority:

      1. objects with overlapping components must not share the same OST. Implies mirror_count <= ost_count / component_stripe_count. In theory this could be relaxed if all replicas have the same stripe_count and stripe_size, then it would only require that the same OST cannot be at the same stripe_index of different components, in which case max replica count == OST count, but this is more difficult to control.
      2. objects with overlapping components should not share OSTs on the same OSS node (by NID from imp->imp_connection->c_peer.nid, as qos_add_tgt() does) to avoid the shared node failure domain.
      3. objects with overlapping components should not share OSTs on the same OSS failover pair (by failover NID from imp->imp_conn_list.oic_conn->c_peer.nid, as lprocfs_import_seq_show() does) to avoid the shared storage enclosure/controller failure domain. There may be other OSS nodes that share the same storage enclosure/controller, but there isn't any way for the client to determine this automatically.
      4. objects with overlapping components should not be on OSTs on the same network switch, power supply, rack, etc. but this depends on external information that is not currently available to Lustre. That could optionally be added via a separate configuration file/options, but the above cases will automatically cover the most important failure scenarios

      Attachments

        Issue Links

          Activity

            [LU-15834] FLR: "lfs mirror extend" should take current OSTs into account

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/56852/
            Subject: LU-15834 lfs: rid of global variable "error_loc"
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 32eeb96e96446003e6087ef494d8deca01885796

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/56852/ Subject: LU-15834 lfs: rid of global variable "error_loc" Project: fs/lustre-release Branch: master Current Patch Set: Commit: 32eeb96e96446003e6087ef494d8deca01885796

            "Zhenyu Xu <bobijam@hotmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56852
            Subject: LU-15834 lfs: rid of global variable "error_loc"
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: bbc90cd435602b3eb0c8813c89cb79f5e2434d5e

            gerrit Gerrit Updater added a comment - "Zhenyu Xu <bobijam@hotmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/56852 Subject: LU-15834 lfs: rid of global variable "error_loc" Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: bbc90cd435602b3eb0c8813c89cb79f5e2434d5e

            I saw in sanity-flr test_49a that it failed because both mirrors were on the same OST:
            https://testing.whamcloud.com/gerrit-janitor/46228/testresults/sanity-flr-special5-ldiskfs-DNE-centos7_x86_64-centos7_x86_64/

            adilger Andreas Dilger added a comment - I saw in sanity-flr test_49a that it failed because both mirrors were on the same OST: https://testing.whamcloud.com/gerrit-janitor/46228/testresults/sanity-flr-special5-ldiskfs-DNE-centos7_x86_64-centos7_x86_64/

            "Zhenyu Xu <bobijam@hotmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50622
            Subject: LU-15834 lfs: mirror extend take current OSTs into account
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 0c1dad7e0e9b533516637e3aa457ad9d263c30dc

            gerrit Gerrit Updater added a comment - "Zhenyu Xu <bobijam@hotmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50622 Subject: LU-15834 lfs: mirror extend take current OSTs into account Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 0c1dad7e0e9b533516637e3aa457ad9d263c30dc
            bobijam Zhenyu Xu added a comment -

            I think not, as mirror extent w/o specifying victim_file, the mirror_extend_layout() just create a volatile file using the basic stripe_size/stripe_count from the source file and the object allocation of the volatile file does not consider those of the source file, and mirror merge just append the layout component of the volatile file as a new mirror to the source file.

            bobijam Zhenyu Xu added a comment - I think not, as mirror extent w/o specifying victim_file, the mirror_extend_layout() just create a volatile file using the basic stripe_size/stripe_count from the source file and the object allocation of the volatile file does not consider those of the source file, and mirror merge just append the layout component of the volatile file as a new mirror to the source file.

            Is this also fixed by the LU-15841 patch?

            adilger Andreas Dilger added a comment - Is this also fixed by the LU-15841 patch?

            People

              bobijam Zhenyu Xu
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: